Large Scale Machine Learning and Other Animals: CLiMF Algorithm in GraphChi

Wednesday, April 17, 2013

CLiMF Algorithm in GraphChi

I got some good news to report: last week we got a great contribution from Mark Levy (last.fm) for GraphChi collaborative filtering toolkit. Mark have implemented the CLiMF algorithm, described in the paper: CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering. Yue Shi, Martha Larson, Alexandros Karatzoglou, Nuria Oliver, Linas Baltrunas, Alan Hanjalic, Sixth ACM Conference on Recommender Systems, RecSys '12.

CLiMF is a ranking method which optimizes MRR (mean reciprocal rank) which is an information retrieval measure for top-K recommenders. CLiMF is a variant of latent factor CF which optimises a significantly different objective function to most methods: instead of trying to predict ratings CLiMF aims to maximise MRR of relevant items. The MRR is the reciprocal rank of the first relevant item found when unseen items are sorted by score i.e. the MRR is 1.0 if the item with the highest score is a relevant prediction, 0.5 if the first item is not relevant but the second is, and so on. By optimising MRR rather than RMSE or similar measures CLiMF naturally promotes diversity as well as accuracy in the recommendations generated. CLiMF uses stochastic gradient ascent to maximise a smoothed lower bound for the actual MRR. It assumes binary relevance, as in friendship or follow relationships, but the graphchi implementation lets you specify a relevance threshold for ratings so you can run the algorithm on standard CF datasets and have the ratings automatically interpreted as binary preferences.

CLiMF-related command-line options:
--binary_relevance_thresh=xx Consider the item liked/relevant if rating is at least this value [default: 0]
--halt_on_mrr_decrease Halt if the training set objective (smoothed MRR) decreases [default: false]
--num_ratings Consider this many top predicted items when computing actual MRR on validation set [default:10000]

Here is an example on running CLiMF on Netflix data:

./toolkits/collaborative_filtering/climf --training=smallnetflix_mm --validation=smallnetflix_mme --binary_relevance_thresh=4 --sgd_gamma=1e-6 --max_iter=6 --quiet=1 --sgd_step_dec=0.9999 --sgd_lambda=1e-6

Training objective:-9.00068e+07
Validation MRR: 0.169322
Training objective:-9.00065e+07
Validation MRR: 0.171909
Training objective:-9.00062e+07
Validation MRR: 0.172372
Training objective:-9.0006e+07
Validation MRR: 0.172503
Training objective:-9.00057e+07
Validation MRR: 0.172544
Training objective:-9.00054e+07
Validation MRR: 0.172549

I am very excited about this development - and I hope many more users will follow with additional contributions to our growing code base! Thanks Mark!!!

Large Scale Machine Learning and Other Animals

Wednesday, April 17, 2013

CLiMF Algorithm in GraphChi

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax