I got this from my collaborator Joey Gonzalez:
A paper that summarizes the state of graph databases that might be worth reading:Regarding actual performance of databases for Graphs, I got an interesting link from my collaborator Yucheng Low:
A nice paper describing how databases systems are built. In particular it talks about the isolation of storage and computation dependencies in a database:
I found an interesting benchmark comparing MySQL NDB against Memcached you may be interested in.
Summary: MySQL NDB faster than Memcached. http://yoshinorimatsunobu.
Really only faster if the entire NDB table can fit in memory (and disk write flushes are disabled). If HDD IO is necessary, it slows down quite a lot. Of course, MySQL sharding+replication can be used to keep things running instead of going to disk.
Additional interesting resource I got from my collaborator Aapo Kyrola, regarding Twitter's FlockDB implementation which implements a graph database in twitter:
The blog post by the Twitter engineering team discusses in quite a lot of detail how they extract so much performance from MySQL, worth a read: http://engineering.twitter.com/2010/05/introducing-flockdb.html
Our goals were:
- Write the simplest possible thing that could work.
- Use off-the-shelf MySQL as the storage engine, because we understand its behavior — in normal use as well as under extreme load and unusual failure conditions.
- Give it enough memory to keep everything in cache.
- Allow for horizontal partitioning so we can add more database hardware as the corpus grows.
- Allow write operations to arrive out of order or be processed more than once. (Allow failures to result in redundant work rather than lost work.) FlockDB was the result.
I got from Carlos Guestrin an Overview of SQL vs. no-SQL data stores.