Wednesday, August 15, 2012

80-20 rule - or collaborative filtering in practice

Many papers are written about new collaborative filtering methods and approaches. After reading tens of papers and talking to tens of companies I have some insight I would like to share here. I call it the 80-20% collaborative filtering rule. My guess that this rule is a common knowledge for people who are working on collaborative filtering, but I did not see it written or discussed anywhere, so I thought it will be useful to bring it to light.

I will start with an example, but I believe that the rule holds generally across many domains. I just got back from the KDD cup workshop in Beijing. The first presentation by Tianqi Chen from SJTU who won the first place in track 1. Here is a summary of the results obtained by his team: Note, that the basic method, item based collaborative filtering got a score of 34.0 %
While a combination of an additional 7 methods got a score of 42.7 %. Now divide
34.0 / 42.7 = 0.796. Namely, the basic method has 79.6% accuracy of the most sophisticated method! And of course it is way simpler.

Now let's jump to the results of the second track. A team from the NUTA had the following result:
The simplest model (Naive bayes) got 0.776. Their best predicted model using all the fancy method is: 0.8069. Namely, the simple model gave 96% accuracy!

Anyway, I can continue on and on, checking other contest results like Netflix and KDD CUP 2011. The bottom line is that typically the simpler methods gives an accuracy of 80% or more.
From talking to many companies who are working on collaborative filtering, 95% of them are happy if they have the basic models working right.

Here is some quote from Netlix technical blog (bold marking is mine):
If you followed the Prize competition, you might be wondering what happened with the final Grand Prize ensemble that won the \$1M two years later. This is a truly impressive compilation and culmination of years of work, blending hundreds of predictive models to finally cross the finish line. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.
In other words, some of the fancy methods that justify publishing a research track paper, or winning an epsilon accuracy contest, are simply not worth the effort of practical deployment.