I got this reference from my collaborator Aapo Kyorla, author of GraphChi.
A Lightweight Infrastructure for Graph Analytics.
Donald Nguyen, Andrew Lenharth, Keshav Pingali (University of Texas at Austin), to appear in SOSP 2013.
It is an interesting paper which heavily compares to GraphLab, PowerGraph (GraphLab v2.1) and
One of the main claims is that dynamic and asynchronous scheduling can significantly speed up many graph algorithms (vs. bulk synchronous parallel model where all graph nodes are executed on each step).
Some concerns I have is regarding the focus on multicore settings, which makes everything much easier, and thus to comparison with PowerGraph less relevant.
Another relevant paper which improves on GraphLab is: Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC. Zhiyuan Lin, Duen Horng Chau, and U Kang, IEEE Big Data Workshop: Scalable Machine Learning: Theory and Applications, 2013. The basic idea is to speed graph loading using mmap() operation.