Wednesday, October 3, 2012

The 10 recommender system metrics you should know about

I got the following interesting email from Denis Parra, a PhD student @ University of Pittsburgh:

Following the post on evaluation metrics in your blog, we would be glad to help you testing new evaluation metrics for GraphChi. Not long ago (this year, actually), with Sherry we wrote a book Chapter on recommender systems focusing on sources of knowledge and evaluation metrics. In section 7.4 we explain some of these evaluation metrics.

For instance, we describe a metric that has become to be popular for evaluating recommendations based on implicit feedback  called MPR (Mean Percentile Ranking) that some authors call Percentile Ranking. This is the method used by Hu et al. in "Collaborative filtering for implicit feedback datasets" (2008) and by Quercia et al. in "Recommending social events from mobile phone location data" (2010) 
PS: In case you want to cite the book chapter, you can use
  chapter = {7},
  title   = {Recommender Systems: Sources of Knowledge and Evaluation Metrics},
  editor  = { J.D. Vel{\' a}squez et al. (Eds.)},
  author  = {Parra, D. and Sahebi, S. },
  booktitle = {Advanced Techniques in Web Intelligence-2: Web User Browsing Behaviour and Preference Analysis},
  publisher = {Springer-Verlag},
  address   = {Berlin Heidelberg},
  pages   = {149–-175},
  year    = {2013}

I think this book chapter is going to become highly useful overview for anyone who is working on recommender system. As a "teaser" until the book comes out, I asked Denis to shortly summarize the different metrics by giving a reference to each one. The book itself will include much detailed explanation of each metrics and its usage.

Denis was very kind to provide me the following list to share in my blog:

Though many of these metrics are described in the seminal paper "Evaluating collaborative filtering recommender systems" by Herlocker et al.,
this is a subset of an updated list of metrics used to evaluate Recommender Systems in the latest years:

For rating

MAE (Mean Absolute Error)

Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: 14th Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998)

MSE (Mean Squared Error)

Shardanand, U., Maes, P.: Social information filtering: algorithms for automating word of mouth. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1995, pp. 210–217. ACM Press/Addison-Wesley Publishing Co., New York (1995)

RMSE (Root mean Squared Error)

Bennett, J., Lanning, S., Netflix, N.: The netflix prize. In: KDD Cup and Workshop in Conjunction with KDD (2007)

Evaluating lists of recommendation (based on relevancy levels)


Le, Q. V. & Smola, A. J. (2007), 'Direct Optimization of Ranking Measures', CoRR abs/0704.3359


Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems (RecSys '10)

MAP: Mean Average Precision

Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

nDCG: normalized Discounted Cummulative Gain

J¨arvelin, K., Kek¨al¨ainen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20, 422–446 (2002)


Intra-list Similarity

Ziegler, C.-N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 22–32. ACM, New York (2005)

Lathia's Diversity

Lathia, N., Hailes, S., Capra, L., Amatriain, X.: Temporal diversity in recommender systems. In: Proceeding of the 33rd International ACMSIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 210–217. ACM, New York (2010)

Implicit Feedback

Mean Percentage Ranking

Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 263–272. IEEE Computer Society, Washington, DC (2008)

User-Centric Evaluation Frameworks

Knijnenburg, B.P., Willemsen, M.C., Kobsa, A.: A pragmatic procedure to support the user-centric evaluation of recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys 2011, pp. 321–324. ACM, New York (2011) Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys 2011, pp. 157–164. ACM, New York (2011)

No comments:

Post a Comment