Large Scale Machine Learning and Other Animals: PowerLyra

Tuesday, November 12, 2013

PowerLyra

We got today the following email from Rong Chen, Shanghai Jiao Tong University:

Hi, GraphLab Experts,

I'm from IPADS group, Shanghai Jiao Tong University, China. This email is aimed at a first time disclosure of project PowerLyra, which is a new hybrid graph analytics engine based on GraphLab 2.2 (PowerGraph).

As you can see, natural graphs with skewed distribution raise unique challenges to graph computation and partitioning. Existing graph analytics frameworks usually use a “one size fits all” design that uniformly processes all vertices and result in suboptimal performance for natural graphs, which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab), or incur high communication cost among vertices even for low-degree vertices (e.g., PowerGraph).

We argued that skewed distribution in natural graphs also calls for differentiated processing of high-degree and low-degree vertices. We then developed PowerLyra, a new graph analytics engine that embraces the best of both worlds of existing frameworks, by dynamically applying different computation and partition strategies for different vertices. PowerLyra uses Pregel/GraphLab like computation models for process low-degree vertices to minimize computation, communication and synchronization overhead, and uses PowerGraph-like computation model for process high-degree vertices to reduce load imbalance and contention. To seamless support all PowerLyra application, PowerLyra further introduces an adaptive unidirectional graph communication.

PowerLyra additionally proposes a new hybrid graph cut algorithm that embraces the best of both worlds in edge-cut and vertex-cut, which adopts edge-cut for low-degree vertices and vertex-cut for high-degree vertices. Theoretical analysis shows that the expected replication factor of random hybrid-cut is always better than both random vertex-cut and edge-cut. For skewed power-law graph, empirical validation shows that random hybrid-cut also decreases the replication factor of current default heuristic vertex-cut (Grid) from 5.76X to 3.59X and from 18.54X to 6.76X for constant 2.2 and 1.8 of synthetic graph respectively. We also develop a new distributed greedy heuristic hybrid-cut algorithm, namely Ginger, inspired by Fennel (a greedy streaming edge-cut algorithm for a single machine). Compared to Gird vertex-cut, Ginger can reduce the replication factor by up to 2.92X (from 2.03X) and 3.11X (from 1.26X) for synthetic and real-world graphs accordingly.

Finally, PowerLyra adopts locality-conscious data layout optimization in graph ingress phase to mitigate poor locality during vertex communication. we argue that a small increase of graph ingress time (less than 10% for power-law graph and 5% for real-world graph) is more worthwhile for an often larger speedup in execution time (usually more than 10% speedup, specially 21% for Twitter follow graph).

Right now, PowerLyra is implemented as an execution engine and graph partitions of GraphLab, and can seamlessly support all GraphLab applications. A detail evaluation on 48-node cluster using three different graph algorithms (PageRank, Approximate Diameter and Connected Components) show that PowerLyra outperforms current synchronous engine with Grid partition of PowerGraph (Jul. 8, 2013. commit:fc3d6c6) by up to 5.53X (from 1.97X) and 3.26X (from 1.49X) for real-world (Twitter, UK-2005, Wiki, LiveJournal and WebGoogle) and synthetic (10-million vertex power-law graph ranging from 1.8 to 2.2) graphs accordingly, due to significantly reduced replication factor, less communication cost and improved load balance.

The website of PowerLyra: http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

The latest release has ported to GraphLab 2.2 (Oct. 22, 2013. commit:e8022e6), which aims to provide best compatibility with minimum changes to framework (Perhaps, only add a "type" field to vertex_record.). But this version has no locality-conscious graph layout optimisation now. You can check out the branch from IPADS's gitlab server: git clone http://ipads.se.sjtu.edu.cn:1312/opensource/powerlyra.git

I did not have time to try it out yet, but it definitely looks like an interesting research direction.

1 comment:

AnonymousNovember 19, 2013 at 1:57 PM
Here's my blog about pets: http://k9friend.blogspot.com/

Hope you like it
ReplyDelete
Replies

Add comment

Large Scale Machine Learning and Other Animals

Tuesday, November 12, 2013

PowerLyra

1 comment:

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax