Monday, July 22, 2013

Benchmarking Machine Learning Frameworks

I am often contacted from different researchers (both universities and companies) who are trying to benchmark and compared different machine learning frameworks. I am trying to introduce them to each other, since basically everyone is compiling the same benchmark tests, and it will be a good idea to create uniform measures and practices for comparing systems. One such example is Intel Labs report I wrote about a couple of months ago in my blog.

A few days ago I got from my collaborator Aapo Kyrola a related paper: Li, Kevin; Gibson, Charles; Ho, David; Zhou, Qi; Kim, Jason; Buhisi, Omar; Brown, Donald E.; Gerber, Matthew, "Assessment of machine learning algorithms in cloud computing frameworks", Systems and Information Engineering Design Symposium (SIEDS), 2013 IEEE, pp.98,103, 26-26 April 2013. IEEExplore

The above paper performs some comparison tests on Amazon EC2, using the same hardware, and similar algorithms and datasets. And here is the bottom line:
As you can see, GraphLab is significantly faster, comparing two tasks: collaborative filtering (ALS) and text analysis (LDA). The paper claim that mahout is slightly more accurate.

Hopefully the construction in the paper is detailed enough so people will be able to reproduce it.


  1. The recommender benchmark in this paper uses a dataset of only 87 MB (!!!).

    That means that Hadoop will never use more than 2 maps slots (as it splits on 64MB) for this dataset. That's why you don't see a speedup in these graphs when the number of machines is increased from 1 to 8, they never utilize more than one machine (it is not even clear whether they fully utilize this one).

    Whats the purpose of publishing such flawed experiments? No one should even consider using a distributed system for a dataset that fits on a smartphone...

  2. If I'm interactively analyzing a problem, the difference between 6 minutes and a few seconds may be:
    * a coffee break, meaning a mental context switch and a big risk of loosing the flow/line-of-thought
    * skipping that extra analysis/QC step, which might end being the key to understanding the problem.
    Thus, any problem taking more than a few seconds to complete is highly advantageous to speed up. If the system doesn't support it, it's a big disadvantage, IMHO.

    I'm in no way affiliated with any of the systems involved.