Thursday, June 9, 2011

What are the most widely deployed machine learning algorithms?

One of the interesting questions is: "what are the most useful machine learning algorithms?".
I did a little survey by looking at the Mahout user mailing list and counting occurrences of keywords. The results I got are shown in the plot above.

It seems that matrix factorization (SVD) is the most widely used algorithm, and then K-means. We have just implemented SVD as a part of the GraphLab Collaborative Filtering library. Anyone who wants to beta test it is welcome!


  1. Nice :) But maybe this is a plot of inverse quality of the documentation for each algorithm?

  2. I think you are wrong - for Mahout's SVD we would get division by zero.. :-)

  3. I think that this is definitely not a measure of usage. From my experience, the order in descending frequency would be recommendations, K-means, SGD, SVD and frequent itemsets. The first two or three dominate. The order of the later ones is uncertain.

  4. nice, added to the list