It is a new project by Aapo Kyorla from CMU, with some surprisingly great performance results.
The basic idea is that computation is limited to a weak machine (mac mini) with a large hard drive or SSD drive. Instead of loading the full problem into memory, the problem is read in parts from disk.
It is hard to believe what kind of results Aapo extracts from a mac mini. For example:
Application | Input graph | Graph size | Comparison | GraphChi on Mac Mini (SSD) | Ref |
Pagerank - 3 iterations | twitter-2010 | 1.5B edges | Spark, 50 machines, 8.1 min | 13 min | 1 |
Pagerank - 100 iterations | uk-union | 3.8B edges | Stanford GPS (Pregel), 30 machines, 144 min | 581 min | 2 |
Web-graph Belief Propagation (1 iter.) | yahoo-web | 6.7B edges | Pegasus, 100 machines, 22 min | 27 min | 3 |
Matrix factorization (ALS), 10 iters | Netflix | 99M edges | GraphLab, 8-core machine, 4.7 min | 9.8 min | 4 |
Triangle counting | twitter-2010 | 1.5B edges | Hadoop, 1636 machines, 423 mins | 55 min | 5 |
Namely, one mac mini does the work of 1636 Hadoop machines, x8 faster!!!
If you don't believe it, you are welcome to download GraphChi. The software is open source with Apache license. It is written in C++ but now I hear Aapo is created a Java version as well.
Additional reading: MIT Tech Review.
Is there a quick way to learn Graphchi.
ReplyDeleteIf you are interested in collaborative filtering, a quick way to start is found in my blog here: http://bickson.blogspot.co.il/2012/08/collaborative-filtering-with-graphchi.html
Delete