I have just implemented parallel ALS using coordinate descent a.k.a.
CCD++ algorithm. The algorithm is described in the following two papers:
H.-F. Yu, C.-J. Hsieh, S. Si, I. S. Dhillon, Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems. IEEE International Conference on Data Mining(ICDM), December 2012.
Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, and Lars Schmidt-Thieme. Fast context-aware recommendations with factorization machines. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR '11). ACM, New York, NY, USA, 635-644.
In a nutshell, it speeds up ALS by avoiding the need for costly least square computation, each dimension (coordinate) is handled separately in parallel.
Documentation of the method for GraphChi is here:
Documentation of the method for GraphLab is here:
Note: For GraphLab the algorithm is implemented in version 2.2 which will be release this summer. It is still possible to checkout this version and try it out using the mercurial command "hg up v2.2".
Let me know if you try it out!