Large Scale Machine Learning and Other Animals: The 10 recommender system metrics you should know about

Wednesday, October 3, 2012

The 10 recommender system metrics you should know about

I got the following interesting email from Denis Parra, a PhD student @ University of Pittsburgh:

Danny,
Following the post on evaluation metrics in your blog, we would be glad to help you testing new evaluation metrics for GraphChi. Not long ago (this year, actually), with Sherry we wrote a book Chapter on recommender systems focusing on sources of knowledge and evaluation metrics. In section 7.4 we explain some of these evaluation metrics.

For instance, we describe a metric that has become to be popular for evaluating recommendations based on implicit feedback called MPR (Mean Percentile Ranking) that some authors call Percentile Ranking. This is the method used by Hu et al. in "Collaborative filtering for implicit feedback datasets" (2008) and by Quercia et al. in "Recommending social events from mobile phone location data" (2010)

Cheers,
Denis

PS: In case you want to cite the book chapter, you can use
@incollection{Denis2012,
chapter = {7},
title = {Recommender Systems: Sources of Knowledge and Evaluation Metrics},
editor = { J.D. Vel{\' a}squez et al. (Eds.)},
author = {Parra, D. and Sahebi, S. },
booktitle = {Advanced Techniques in Web Intelligence-2: Web User Browsing Behaviour and Preference Analysis},
publisher = {Springer-Verlag},
address = {Berlin Heidelberg},
pages = {149–-175},
year = {2013}
}

I think this book chapter is going to become highly useful overview for anyone who is working on recommender system. As a "teaser" until the book comes out, I asked Denis to shortly summarize the different metrics by giving a reference to each one. The book itself will include much detailed explanation of each metrics and its usage.

Denis was very kind to provide me the following list to share in my blog:

Though many of these metrics are described in the seminal paper "Evaluating collaborative filtering recommender systems" by Herlocker et al.,
this is a subset of an updated list of metrics used to evaluate Recommender Systems in the latest years:

For rating

MAE (Mean Absolute Error)
Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for
collaborative filtering. In: 14th Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998)
MSE (Mean Squared Error)
Shardanand, U., Maes, P.: Social information filtering: algorithms for automating word of mouth. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1995, pp. 210–217. ACM Press/Addison-Wesley Publishing Co., New York (1995)
RMSE (Root mean Squared Error)
Bennett, J., Lanning, S., Netflix, N.: The netflix prize. In: KDD Cup and Workshop in Conjunction with KDD (2007)
Evaluating lists of recommendation (based on relevancy levels)

Precision@n
Le, Q. V. & Smola, A. J. (2007), 'Direct Optimization of Ranking Measures', CoRR abs/0704.3359
Recall
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems (RecSys '10)
MAP: Mean Average Precision
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
nDCG: normalized Discounted Cummulative Gain
J¨arvelin, K., Kek¨al¨ainen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20, 422–446 (2002)
Diversity

Intra-list Similarity
Ziegler, C.-N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 22–32. ACM, New York (2005)
Lathia's Diversity
Lathia, N., Hailes, S., Capra, L., Amatriain, X.: Temporal diversity in recommender systems. In: Proceeding of the 33rd International ACMSIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 210–217. ACM, New York
(2010)

Implicit Feedback

Mean Percentage Ranking
Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 263–272. IEEE Computer Society, Washington, DC (2008)

User-Centric Evaluation Frameworks
Knijnenburg, B.P., Willemsen, M.C., Kobsa, A.: A pragmatic procedure to support the user-centric evaluation of recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys 2011, pp. 321–324. ACM, New York (2011)

Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems.
In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys 2011, pp. 157–164. ACM, New York (2011)

Large Scale Machine Learning and Other Animals

Wednesday, October 3, 2012

The 10 recommender system metrics you should know about

For rating

MAE (Mean Absolute Error)

MSE (Mean Squared Error)

RMSE (Root mean Squared Error)

Evaluating lists of recommendation (based on relevancy levels)

Precision@n

Recall

MAP: Mean Average Precision

nDCG: normalized Discounted Cummulative Gain

Diversity

Intra-list Similarity

Lathia's Diversity

Implicit Feedback

Mean Percentage Ranking

User-Centric Evaluation Frameworks

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax