Monday, February 21, 2011

Large scale matrix factorization using alternating least suqares: which is better - GraphLab or Mahout?

I am working in the last couple of weeks on comparing the performance of GraphLab vs. Mahout on Alternaring least squares using Netflix data. To remind, GraphLab is the parallel machine learning system we are building in CMU.

Initial results are encouraging. Mahout Alternating least squares implementation by Sebastian Schelter was tested on Amazon EC2, using two m2.2xlarge nodes (13x2 virtual cores).

For running 10 iterations, number of features=20, lambda=0.065, it takes 39272 seconds, while GraphLab implementation in C++ takes only 714 seconds (on a machine with 8 cores).

Running time may be taken with a grain of salt, since I was not using the exact same machine, but the magnitude of difference will certainly hold even if I would run GraphLab on EC2 (which I plan to do soon).

Regarding accuracy, Mahout ALS has a test RMSE accuracy of
0.9310 while GraphLab obtained slightly better accuracy of 0.9279.

Here is Mahout ALS final output: (of the RMSE computation)
ubuntu@ip-10-115-27-222:/mnt$ /usr/local/mahout-0.4/bin/
mahout evaluateALS --probes /user/ubuntu/myout/probeSet/ --userFeatures /tmp/als/out/U/ --itemFeatures /tmp/als/out/M/ | grep RMSE
11/02/17 12:31:42 WARN driver.MahoutDriver: No evaluateALS.props found on classpath, will use command-line arguments only
11/02/17 12:31:42 INFO common.AbstractJob: Command line arguments: {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/, --probes=/user/ubuntu/myout/probeSet/, --startPhase=0, --tempDir=temp, --userFeatures=/tmp/als/out/U/}
RMSE: 0.9310729597725026, MAE: 0.7298745910296568
11/02/17 12:31:55 INFO driver.MahoutDriver: Program took 12437 ms

Here is the GraphLab output:
bickson@biggerbro:~/newgraphlab/graphlabapi/debug/apps/pmf$ ./PMF netflix-r 10 0 --D=20 --max_iter=10 --lambda=0.065 --ncpus=8
setting run mode 0
INFO   :pmf.cpp(main:1121): PMF starting

loading data file netflix-r
Loading netflix-r train
Creating 99072112 edges...
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................loading data file netflix-re
Loading netflix-re test
Creating 1408395 edges...
........setting regularization weight to 0.065
PTF_ALS for matrix (480189, 17770, 27):99072112.  D=20
pU=0.065, pV=0.065, pT=1, muT=1, D=20
nuAlpha=1, Walpha=1, mu=0, muT=1, nu=20, beta=1, W=1, WT=1 BURN_IN=10
complete. Obj=6.83664e+08, TEST RMSE=3.7946.
INFO   :asynchronous_engine.hpp(run:56): Worker 0 started.

...

INFO   :asynchronous_engine.hpp(run:56): Worker 7 started.

Entering last iter with 1
228.524) Iter ALS 1  Obj=2.60675e+08, TRAIN RMSE=2.2904 TEST RMSE=0.9948.
Entering last iter with 2
289.594) Iter ALS 2  Obj=6.48921e+07, TRAIN RMSE=1.1400 TEST RMSE=0.9573.
Entering last iter with 3
350.487) Iter ALS 3  Obj=4.75073e+07, TRAIN RMSE=0.9754 TEST RMSE=0.9444.
Entering last iter with 4
411.551) Iter ALS 4  Obj=4.09914e+07, TRAIN RMSE=0.9063 TEST RMSE=0.9381.
Entering last iter with 5
472.615) Iter ALS 5  Obj=3.79096e+07, TRAIN RMSE=0.8718 TEST RMSE=0.9348.
Entering last iter with 6
533.039) Iter ALS 6  Obj=3.61298e+07, TRAIN RMSE=0.8513 TEST RMSE=0.9324.
Entering last iter with 7
594.177) Iter ALS 7  Obj=3.50076e+07, TRAIN RMSE=0.8382 TEST RMSE=0.9305.
Entering last iter with 8
654.41) Iter ALS 8  Obj=3.42655e+07, TRAIN RMSE=0.8294 TEST RMSE=0.9290.
Entering last iter with 9
714.095) Iter ALS 9  Obj=3.37535e+07, TRAIN RMSE=0.8234 TEST RMSE=0.9279.
INFO   :asynchronous_engine.hpp(run:66): Worker 6 finished.

...

INFO   :asynchronous_engine.hpp(run:66): Worker 2 finished.

1 comment:

  1. A question I got from Alexandre Rodriguez (FEUP):

    Being distributed, .. , I'm considering in using GraphLib to write a ALSWR Factorizer (distributed) and some other functions (I must plan how should I use the GraphLib paradigm to do so).

    Can you tell me if there's any kind of SVD implementation or similar approaches using GraphLab?

    My answer:
    Of course! There is graphlab implementation of the exact ALSWR algorith.
    Documentation is found here: http://www.graphlab.ml.cmu.edu/pmf.html

    Installation instructions for Linux: http://bickson.blogspot.com/2011/02/graphlab-large-scale-machine-learning.html
    Installation instructions for MAC OS: http://bickson.blogspot.com/2011/02/graphlab-large-scale-machine-learning_28.html
    You will need to install itpp/lapack. Installation instructions is found here:
    http://bickson.blogspot.com/2011/02/installing-blaslapackitpp-on-amaon-ec2.html

    This post discusses performance of Graphlab compared to Mahoot:
    http://bickson.blogspot.com/2011/02/large-scale-matrix-factorization-using.html

    Anyone who is trying this out - let me know if you need any assistance in installation and setup. I am quite excited since the code was just released and I am already in touch with several people who are trying it already. Any feedback is appreciated.

    ReplyDelete