Sunday, December 8, 2013

Microsoft AdPredictor (Click Through Rate Prediction) is now implemented in GraphLab!

A couple of years ago, we competed in KDD CUP 2012 and won the 4th place. We used Microsoft's AdPredictor as one of three useful models in this competition as described in our paper: Xingxing Wang, Shijie Lin, Dongying Kong, Liheng Xu, Qiang Yan, Siwei Lai, Liang Wu, Guibo Zhu, Heng Gao, Yang Wu, Danny Bickson, Yuanfeng Du, Neng Gong, Chengchun Shu, Shuang Wang, Fei Tan, Jun Zhao, Yuanchun Zhou, Kang Liu. Click-Through Prediction for Sponsored Search Advertising with Hybrid Models. In ACM KDD CUP workshop 2012.

AdPredictor is described in the paper:
Graepel, Thore, et al. "Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010. html

I tried to look for an open source implementation of AdPredictor and did not find even a single source code. Not surprising, considering the fact that several companies are using it in production for predicting ads  CTR (click through rate). So I have decided to go for a fun weekend activity for implementing AdPredictor for GraphLab. The code is available here.

In a nutshell, AdPredictor computes a linear regression model with probit link function.
The input to the algorithm are observations of the type
-1 3:1 4:1 6:1 9:1
1 4:1 5:1 18:1 19:1
where the first field -1 is the action (did not click) or 1 (clicked). Next there are pairs of binary features.
The output of the algorithm are weights for each feature. When a new ad comes in, we should simply sum up the weights for the matching features. If the weights are smaller than zero then the prediction is -1 and vice versa.

You are welcome to download graphlab from and try it out!

Adpredictor takes file in libsvm format. You should prepare a sub folder with the training file and validation (file needs to end with .validate).

You can run adpredictor using the command:
./adpredictor --matrix=folder/ --max_iter=10 --beta=1

As always let me know if you try it out!


  1. Hi,

    I was looking into this algorithm that you implemented taking into consideration the adpredictor algorithm. I have a similar implementation. What I understood from the algorithm is that each feature is given weight which is binary. So taking that into account; every feature in each of the rows is either considered or not. I couldn't see the values of each feature coming into play anywhere in the algorithm. Have I misunderstood the logic or is it supposed to be that way.


    1. You are correct. I assume the features are binary, namely either 1 or 0. If the feature is 0 no need to give it in the input since 0 values are ignored.

    2. Alright. Thought so. I have mostly binary features but a few are non binary and hence I'm guessing I just need to change the code to calculate the mean and variance of non binary feature sets as well. Now I've actually come to a dilemma as one of the features has a possible 300 values. How do you suggest I go about it.

    3. The Adpredictor paper is also written with binary variables in mind. Any variable which is categorical should be translated to binary. For example zip codes. Larger zip code does not mean this event is more important than smaller zip code and vice versa. The trivial way for converting your variable with 300 values is adding 300 columns. However if this is too much, you can bin it into a small set of values and group items together. Alternatively you can try binary encoding which will need log(N) columns. It is hard to say what is the improtance of your 300 value variable. So my suggestion is to try a few approaches and see which works better.