Large Scale Machine Learning and Other Animals: Graph Database Resources

Saturday, January 12, 2013

Graph Database Resources

I got this from my collaborator Joey Gonzalez:

A paper that summarizes the state of graph databases that might be worth reading:
http://swp.dcc.uchile.cl/TR/2005/TR_DCC-2005-010.pdf
A nice paper describing how databases systems are built. In particular it talks about the isolation of storage and computation dependencies in a database:
http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

Regarding actual performance of databases for Graphs, I got an interesting link from my collaborator Yucheng Low:

I found an interesting benchmark comparing MySQL NDB against Memcached you may be interested in.
Summary: MySQL NDB faster than Memcached. http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
Really only faster if the entire NDB table can fit in memory (and disk write flushes are disabled). If HDD IO is necessary, it slows down quite a lot. Of course, MySQL sharding+replication can be used to keep things running instead of going to disk.

Additional interesting resource I got from my collaborator Aapo Kyrola, regarding Twitter's FlockDB implementation which implements a graph database in twitter:

The blog post by the Twitter engineering team discusses in quite a lot of detail how they extract so much performance from MySQL, worth a read: http://engineering.twitter.com/2010/05/introducing-flockdb.html

Our goals were:

Write the simplest possible thing that could work.

Use off-the-shelf MySQL as the storage engine, because we understand its behavior — in normal use as well as under extreme load and unusual failure conditions.

Give it enough memory to keep everything in cache.

Allow for horizontal partitioning so we can add more database hardware as the corpus grows.

Allow write operations to arrive out of order or be processed more than once. (Allow failures to result in redundant work rather than lost work.) FlockDB was the result.

I got from Carlos Guestrin an Overview of SQL vs. no-SQL data stores.

1 comment:

DZONEMVBJanuary 16, 2013 at 12:15 PM
You should get some of your content promoted on DZone.com - we have topical portals focused on both NoSQL solutions and Big Data technologies. If you're interested in reaching a wider audience of advanced developers, email me at egenesky@dzone.com
ReplyDelete
Replies

Add comment

Large Scale Machine Learning and Other Animals

Saturday, January 12, 2013

Graph Database Resources

1 comment:

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax