Large Scale Machine Learning and Other Animals: Steffen Rendle

Wednesday, August 15, 2012

Steffen Rendle - libFM

News from the KDD CUP workshop. I was highly impressed by Steffen Rendle, the author of libFM collaborative filtering library. Steffen won the 2nd place in track1 and the 3rd place in track2. Unlike our team who had around 15 people, and the Taiwanese team who had around 20 people Steffen worked alone, and got a great rating in BOTH tracks.

What is nice about Steffen's work, is that he is using only a SINGLE algorithm, and not an ensemble of methods as typically deployed. The trick is the he does very smart feature engineering to create a very good feature matrix. Once he gets the representative feature matrix he uses the libFM algorithm.

I asked Steffen to explain the essense of the method:

A is the input (design) matrix where each row is a case and each column a (real-valued) predictor variable. I.e. the same way of feature engineering as in other standard ML algorithms such as linear/ polynomial regression, SVM, etc. Internally, the FM model works similarly as a polynomial regression, i.e. it contains all pairwise interactions between any two variables in A. The important difference to polynomial regression is that the model parameters for variable interactions are factorized. This is the reason why FMs perform well in problems such as recommender systems.

Some of the recent notable work of Steffen is a caching method for caching ALS computation, that according to Steffen, significantly speeds up ALS computation and makes it a lighter algorithm like SGD. The work is described in his recent SIGIR 2011 paper.

A second interesting work is an online matrix factorization computation described in the paper: Steffen Rendle, Lars Schmidt-Thieme (2008): Online-Updating Regularized Kernel Matrix Factorization Models forLarge-Scale Recommender Systems, in Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys 2008), ACM.
When new users/ items are added into the system, only an incremental update is computed.

Finally, Steffen gave a very detailed tutorial in KDD, about a whole bunch of matrix factorization methods and the relations between them. I find the tutorial a good overview of the connections between algorithms, however it is intended for intermediate user level who actually master some of the algorithms in this domain.

As you may know, we have a very crude and preliminary libFM implementation in GraphLab. libFM algorithm implementation contains a subset of the full libFM functionality with only three predictions: user, item and time. Users are encouraged to check the original libFM library for enhanced implementation. libFM library has a track record performance in KDD CUP and is highly recommended collaborative filtering package.

Large Scale Machine Learning and Other Animals

Wednesday, August 15, 2012

Steffen Rendle - libFM

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax