Saturday, January 12, 2013

Graph Database Resources

I got this from my collaborator Joey Gonzalez:
A paper that summarizes the state of graph databases that might be worth reading:
A nice paper describing how databases systems are built.  In particular it talks about the isolation of storage and computation dependencies in a database:
Regarding actual performance of databases for Graphs, I got an interesting link from my collaborator Yucheng Low:
I found an interesting benchmark comparing MySQL NDB against Memcached you may be interested in.
Summary: MySQL NDB faster than Memcached.
Really only faster if the entire NDB table can fit in memory (and disk write flushes are disabled). If HDD IO is necessary, it slows down quite a lot.  Of course, MySQL sharding+replication can be used to keep things running instead of going to disk.

Additional interesting resource I got from my collaborator Aapo Kyrola, regarding Twitter's FlockDB implementation which implements a graph database in twitter:
The blog post by the Twitter engineering team discusses in quite a lot of detail how they extract so much performance from MySQL, worth a read:  
Our goals were: 
  • Write the simplest possible thing that could work. 
  • Use off-the-shelf MySQL as the storage engine, because we understand its behavior — in normal use as well as under extreme load and unusual failure conditions. 
  • Give it enough memory to keep everything in cache. 
  • Allow for horizontal partitioning so we can add more database hardware as the corpus grows. 
  • Allow write operations to arrive out of order or be processed more than once. (Allow failures to result in redundant work rather than lost work.) FlockDB was the result. 

I got from Carlos Guestrin an Overview of SQL vs. no-SQL data stores.

1 comment:

  1. You should get some of your content promoted on - we have topical portals focused on both NoSQL solutions and Big Data technologies. If you're interested in reaching a wider audience of advanced developers, email me at