Monday, July 16, 2012

A new dog on the block: GraphChi

Anyone who attended our GraphLab workshop could not have missed the new GraphChi release.
It is a new project by Aapo Kyorla from CMU, with some surprisingly great performance results.
The basic idea is that computation is limited to a weak machine (mac mini) with a large hard drive or SSD drive. Instead of loading the full problem into memory, the problem is read in parts from disk.

It is hard to believe what kind of results Aapo extracts from a mac mini. For example:

ApplicationInput graphGraph sizeComparisonGraphChi on Mac Mini (SSD)Ref
Pagerank - 3 iterationstwitter-20101.5B edgesSpark, 50 machines, 8.1 min13 min1
Pagerank - 100 iterationsuk-union3.8B edgesStanford GPS (Pregel), 30 machines, 144 min581 min2
Web-graph Belief Propagation (1 iter.)yahoo-web6.7B edgesPegasus, 100 machines, 22 min27 min3
Matrix factorization (ALS), 10 itersNetflix99M edgesGraphLab, 8-core machine, 4.7 min9.8 min4
Triangle countingtwitter-20101.5B edgesHadoop, 1636 machines, 423 mins55 min5

Namely, one mac mini does the work of 1636 Hadoop machines, x8 faster!!!

If you don't believe it, you are welcome to download GraphChi. The software is open source with Apache license. It is written in C++ but now I hear Aapo is created a Java version as well.

Additional reading: MIT Tech Review.


  1. Is there a quick way to learn Graphchi.

    1. If you are interested in collaborative filtering, a quick way to start is found in my blog here: