Large Scale Machine Learning and Other Animals: Oracle Labs Report on GraphLab

Tuesday, November 29, 2011

Oracle Labs Report on GraphLab

Recently we had some interesting discussions with Eric Sedlar, Technical Director, Oracle Labs. Oracle examined GraphLab as a potential framework for implementing graph algorithms and assigned Sungpack Hong, a researcher at Oracle Labs to write a tech report on this subject.

A few days ago I got a copy of this seven pages tech report which is called "A brief report on Graphlab and Green-Marl". I asked for permission from Eric and Sungpack to discuss their main finding here. Their full tech report may be published by them at the later date.

According to Sungpack there are 4 characterizing aspects of algorithms which Graphlab supports:

Algorithms can be described in a node-centric way; same computation is repeatedly performed on every node.
Significant amounts of computations are performed on each node.
The underlying graphs have large diameter.
Determinism can be neglected.

Here is my reaction to those four points:

Agreed.
Agreed.
This is not necessarily correct - you can use Graphlab for any Graph as long they are sparse.
Actually, we support round robin scheduling, so it is quite easy to have full sweeps over all nodes. Many of the matrix factorization algorithms we implemented fall into this category - they are completely deterministic (besides of maybe some random initialization of the starting state).

Next, some issues are discussed:

Programmability: user must restructure his algorithm in a node centric way.
There is an overhead of runtime system when the amount of computation performed at each node is small.
Small world graphs: GraphLab lock scheme may suffer from frequent conflicts for such graphs.
Determinism: the result of computed algorithm in Graphlab may become non-deterministic.
Concerns about distributed execution.

Here is my thoughts on those points:

This is a correct concern, not every algorithm is suitable for programming in GraphLab.
Agreed.
Agreed. In GraphLab v2 Joey is heading significant improvement which will allow factoring the update function in to several different computations to deal with this issue.
Objection! As shown, for example, by our participation in the ACM KDD CUP contest this year, we got very high quality prediction results, that where completely deterministic. It is possible to implement a wide variety of deterministic algorithms in GraphLab.
Agreed. Yucheng is heading the effort of completely rewriting the distributed engine and we believe significant performance improvements are possible in the distributed settings.

Finally, the following conclusion is given:

... I believe GraphLab still is a valuable back-end for Green-Marl (DB: their programming language) .. it is a very natural to use GraphLab as one of Green-Marl's back-ends.... GraphLab and Green-Marl can be helpful to each other... Finally it will be interesting to wait .. for GraphLab2. A fair comparison of distributed graph processing ... would be meaningful then.

Overall, I wanted to thank Eric and Sungpack for the great and serious effort they did in evaluating GraphLab! While I am of course biased, it is important to get unbiased evaluation of third parties.

2 comments:

AnonymousMarch 3, 2013 at 4:01 AM
Interesting article, thanks for posting...
Is there any news about this integration of Green-Marl and Graphlab? What is its current development situation?

Regards,

A Graphlab Rookie User
ReplyDelete
Replies

Add comment

Large Scale Machine Learning and Other Animals

Tuesday, November 29, 2011

Oracle Labs Report on GraphLab

2 comments:

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax