Large Scale Machine Learning and Other Animals: July 2013

Monday, July 22, 2013

2nd GraphLab Workshop Talks Now Online!

We have just released most of the workshop talks. If you missed the workshop, you are encouraged to catch up! Here is the GraphLab Keynote talk by Prof. Carlos Guestrin, CEO of GraphLab & Prof. at University of Washington:

The rest of the talk videos are here.

Patrick Durusau sent me a link to his blog post where he linked all speakers and presenters to their DBLP publications. Thanks Patrick!

Benchmarking Machine Learning Frameworks

I am often contacted from different researchers (both universities and companies) who are trying to benchmark and compared different machine learning frameworks. I am trying to introduce them to each other, since basically everyone is compiling the same benchmark tests, and it will be a good idea to create uniform measures and practices for comparing systems. One such example is Intel Labs report I wrote about a couple of months ago in my blog.

A few days ago I got from my collaborator Aapo Kyrola a related paper: Li, Kevin; Gibson, Charles; Ho, David; Zhou, Qi; Kim, Jason; Buhisi, Omar; Brown, Donald E.; Gerber, Matthew, "Assessment of machine learning algorithms in cloud computing frameworks", Systems and Information Engineering Design Symposium (SIEDS), 2013 IEEE, pp.98,103, 26-26 April 2013. IEEExplore

The above paper performs some comparison tests on Amazon EC2, using the same hardware, and similar algorithms and datasets. And here is the bottom line:

As you can see, GraphLab is significantly faster, comparing two tasks: collaborative filtering (ALS) and text analysis (LDA). The paper claim that mahout is slightly more accurate.

Hopefully the construction in the paper is detailed enough so people will be able to reproduce it.

Sunday, July 7, 2013

Large Scale Reommender Systems Workshop - LSRS 2013 - Call for papers

As part of Recsys 2013, we (Tao Ye from Pandora & Quan Yuan from Taobao) are organizing a large scale recommender system workshop.

Anyone working in this domain is encouraged to submit an abstract describing your work.

Submission deadline is ~~July 21st, 2013~~. Extended to July 25, 2013.

We have two confirmed keynotes:

Aapo Kyrola, CMU will talk about GraphChi out of core graph computation framework
Justin Basilico, Netflix will talk about collaborative filtering @ Netflix

Saturday, July 6, 2013

Still a chance to win a Google Chrome book, iPad and Kindle!!

Only two days left to fill our online survey.

Looking forward to hear more what are you working on, which tools are you using and what data magnitudes.

We will announce in a couple of days of the winners (by emailing the winners directly, and also using this blog)

Additionally, we have 3 local GraphLab users meetup groups:

San Francisco: http://www.meetup.com/San-Francisco-GraphLab-Users/
Atlanta: http://www.meetup.com/Atlanta-GraphLAB-Users-Group/
Seattle: http://www.meetup.com/Seattle-GraphLab-Users/
New York: http://www.meetup.com/New-York-GraphLab-Users/

We plan to hold occasional meetups with tutorials, demo and new feature releases of GraphLab.

Everyone is welcome to join!

GraphLab Workshop Reflection

A nice Gigaom article.

I got from my collaborator Chris DuBois, the following blog post about the GraphLab workshop, written by OpenDNS security research.

Here are some highlights:

A few hundred researchers from academia and industry gathered on Monday, July 1 for the 2nd annual Graphlab Workshop at the Nikko hotel in downtown San Francisco. The event was a great success in acting as a venue to discuss challenges and opportunities the emerging large scale graph analytics community currently faces.

The first talk was about a product we have been excited about for quite some time called GraphLab.

....

Umbrella Security Labs tried GraphLab 2 a couple months ago on our research 10-node cluster, and were impressed by the results. Algorithms running at high speed allowed us to quickly build new models and check their output on a complete data set.
Furthermore, a solid set of algorithms have already been implemented on top of this incredibly fast engine. They address a wide range of problems, from the domains of graph mining, to machine learning and linear algebra.