Large Scale Machine Learning and Other Animals: Intel Labs report on GraphLab vs. Mahout

Tuesday, March 5, 2013

Intel Labs report on GraphLab vs. Mahout

I have some very interesting news to report. I got from Nezih Yigitbasi, Intel Labs, the following graph:

It compares Mahout vs. Distributed GraphLab on the popular task of matrix factorization using ALS algorithm (alternating least squares) on Netflix data. The bottom line is that GraphLab is about x20 faster than Mahout.

And here is the exact experiment setup, I got from Nezih:

N is the number of ALS iterations, D is the number of latent factors. The experiments have been conducted on a 16 node cluster.
We start GL as mpirun -hostfile ~/hostfile -x CLASSPATH ./als –ncpus=16 --matrix hdfs://host001:19000/user/netflix --D=$LATENT_FACTOR_COUNT --max_iter=$ITER_COUNT --lambda=0.065 --minval=0 --maxval=5
To run mahout ALS, we use the factorize-netflix.sh script under the examples directory. It should be run as ./factorize-netflix.sh /path/to/training_set/ /path/to/qualifying.txt /path/to/judging.txt
In our test cluster we have 16 machines each with 64GB of memory, 2 CPUs (Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz [8 cores each]) and 4 x 1 TB HDDs. The machines communicate over a 10Gb Ethernet interconnect.
The Netflix dataset has been splitted into 32 equally sized chunks and then put into HDFS.

16 comments:

Ted Dunning ... apparently BayesianMarch 21, 2013 at 11:53 PM
Which version of Mahout?

Mahout trunk has made substantial improvements in speed lately.

The asynchronous style of GraphLab can still have substantial benefits, but it isn't as bad as this looks.

In the area of other algorithms such as clustering, Mahout has made multiple order of magnitude improvements and may now be considerable faster than Graphlab.

One other important point is the matter of goals. Mahout's primary goal is practical scalability. As such, running under a dominant scaled paradigm such as map-reduce is critical. This imposes substantial penalties in some algorithms such as ALS.
ReplyDelete
Replies
AnonymousMarch 22, 2013 at 12:21 AM
What do you mean by "dominant scaled paradigm"?
ReplyDelete
Replies
SebastianMarch 22, 2013 at 3:12 AM
I see lots of problems with this benchmark, unfortunately.

First, the Netflix dataset has about 1.5GB in text format. Splitting it into 32 chunks means that each chunk is smaller than the default blocksize used by Hadoop. This means that you will run only 32 mappers on a cluster with 16 * 16 = 256 cores. ALS is a CPU intensive problem and this setup uses only 1/8th of the available computing power.

Second, you should use the latest version of ALS in Mahout, which is able to run multithreaded and configure it in an optimal way (e.g. enable jvm reuse for mappers).

Third, the example bash script that is used for Mahout includes a command line program to preprocess the data and starts a second job that computes the RMSE of the factorization. None of this has something todo with the performance of ALS.
ReplyDelete
Replies
Ted Dunning ... apparently BayesianMarch 22, 2013 at 3:36 AM
To expand on Sebastian's comments, the trunk version of ALS is about 2-3 times faster lately due to a nearly 10x improvement in the QR decomposition speed. If you combine that with using 1/8th of the available cores, you get an estimate that the speed of Mahout is somewhere between 16/22 to 24/22 relative to Graphlab. That is, not much detectable difference.

If you then include the fact that the pre-processing and error estimation phases of Mahout should be removed, it may well be that Mahout dominates GL on speed.

In fact, of course, with a flawed benchmark the real result is that you know nothing new.
ReplyDelete
Replies
NezihMarch 22, 2013 at 6:56 AM
Hey guys,
Some clarifications here.
- Data preprocessing and RMSE calculation are NOT included in the timings. I only timed the ALS computation.
- The dataset is splitted into 32 chunks ONLY for GraphLab, not for Mahout.
- I will definitely have a look at Mahout trunk to see how much it improves the results.

Nezih
ReplyDelete
Replies
Danny BicksonMarch 22, 2013 at 9:48 AM
Hi guys,
I am very excited to have this discussion. The reason I posted it in my blog is that I hoped for such comments. Since Intel Labs is definitely objective, the would love getting any tips from you about tuning the algorithm performance, we would love to give mahout as fair chance as possible.
ReplyDelete
Replies
mikazaMay 20, 2013 at 11:17 AM
Hi, Danny!

Does the ALS-WR method based on Y. Koren et al. paper "Collaborative Filtering for Implicit Feedback Datasets" differ from WALS, presented here? I'm asking, because GraphChi needs two parameters, rating AND weight, while Mahout impl of ALS-WR requires only confidence ratings (r_ij according to the article)? Probably I missed something, but I want to compare the results with implicit feedback and no numeric ratings
ReplyDelete
Replies
UnknownJuly 11, 2013 at 8:44 AM
Any updates from Intel Labs using an optimised Mahout setup as suggested by Sebastian?
ReplyDelete
Replies
Danny BicksonJuly 11, 2013 at 11:11 AM
Will ask and get back to you..
ReplyDelete
Replies

Add comment

Large Scale Machine Learning and Other Animals

Tuesday, March 5, 2013

Intel Labs report on GraphLab vs. Mahout

16 comments:

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax