Wednesday, December 7, 2011

Preview of GraphLab v2 new features!

GraphLab v2 new features are going to be presented in Intel's ISTC-CC retreat. The new features are the result of the hard work for Joey Gonzalez, who spent the summer as part of our collaboration with Yahoo! Research, in Alex Smola's lab.

The main challenge identified by Joey with GraphLab v1, is the handling of natural (power-law) graphs.

The problem with iterative graph algorithms of such graphs, is that when computing the update of a high degree node one has to touch large portion of the graph, and thus is is hard to distribute computation. Furthermore, computation of a high degree node is done sequentially and thus we loose some of the parallelism. Last, whenever an algorithm use locking of neighbors state (to avoid concurrent writes to their data structures) large number of locks has to be acquired which significantly degrades performance. 

The first solution is factorized update functions (shown above). The work on a high degree nodes is divided into smaller pieces of work which can be done in parallel.

The second solution is associate/commutative update functions, a.k.a delta updates (shown below). Whenever sums over a large number of nodes have to be computed, when the state of most of them is constant, there is no much point in summing up the result again and again. In this case, only the change in the result is tracked.

Some performance results on Yahoo! webgraph (25.5M nodes, 355M edges):

GraphLab v2 experiment version can be downloaded from our mercurial code base (using the command hg update -v2). We plan a full release in a couple of months.

No comments:

Post a Comment