CLiMF is a ranking method which optimizes MRR (mean reciprocal rank) which is an information retrieval measure for top-K recommenders. CLiMF is a variant of latent factor CF which optimises a significantly different objective function to most methods: instead of trying to predict ratings CLiMF aims to maximise MRR of relevant items. The MRR is the reciprocal rank of the first relevant item found when unseen items are sorted by score i.e. the MRR is 1.0 if the item with the highest score is a relevant prediction, 0.5 if the first item is not relevant but the second is, and so on. By optimising MRR rather than RMSE or similar measures CLiMF naturally promotes diversity as well as accuracy in the recommendations generated. CLiMF uses stochastic gradient ascent to maximise a smoothed lower bound for the actual MRR. It assumes binary relevance, as in friendship or follow relationships, but the graphchi implementation lets you specify a relevance threshold for ratings so you can run the algorithm on standard CF datasets and have the ratings automatically interpreted as binary preferences.
CLiMF-related command-line options:
--binary_relevance_thresh=xx Consider the item liked/relevant if rating is at least this value [default: 0]
--halt_on_mrr_decrease Halt if the training set objective (smoothed MRR) decreases [default: false]
--num_ratings Consider this many top predicted items when computing actual MRR on validation set [default:10000]
Here is an example on running CLiMF on Netflix data:
./toolkits/collaborative_filtering/climf --training=smallnetflix_mm --validation=smallnetflix_mme --binary_relevance_thresh=4 --sgd_gamma=1e-6 --max_iter=6 --quiet=1 --sgd_step_dec=0.9999 --sgd_lambda=1e-6
Validation MRR: 0.169322
Validation MRR: 0.171909
Validation MRR: 0.172372
Validation MRR: 0.172503
Validation MRR: 0.172544
Validation MRR: 0.172549
I am very excited about this development - and I hope many more users will follow with additional contributions to our growing code base! Thanks Mark!!!