Sunday, September 2, 2012

Harry Potter Effect on Recommendations

Just learned a new thing from my mega collaborator Justin Yan. The Harry Potter effect is
a term in psychology for a phenomena which is very popular (everyone likes Harry Potter).

When computing recommendations (for example movies) we can compare movie similarity
to recommend to the user similar movies to the one she saw. The problem, when we have an item which is liked by everyone, it "covers" any other subset of the data, and thus it will have the greatest similarity to all other items. For example, the movie Tarzan was watched 10 times. Harry Potter was watched 1,000,000 times. Every single time a viewer watched Tarzan, she watched also Harry Potter! So Harry Potter is the most similar movie to Tarzan. And to all the other movies as well...

To prevent this Harry Potter effect, we normalize by the total number of ratings in the data. Thus if we had an overlap of 10 watches of Tarzan with Harry Potter, we divide it to 1,000,000 occurrences of  Harry Potter ratings, and thus we diminish the "Harry Potter" effect.

1 comment:

  1. Sounds like you need something like "Inverse document frequency" ?