I met Troy Raeder at ACM KDD CUP in Beijing and asked him to tell me a little about
From a business perspective, we do targeted display advertising. Essentially, consumer brands contract with us to find strong targets for their advertising and show display ads (banner or video) to these targets on the web and mobile devices. Within display advertising, there are two common approaches: "retargeting", which is advertising to people who have already visited your brand's website; and "prospecting", which requires finding people who have never been to your site but are good candidates for your brand. Retargeting is fairly popular. You've probably noticed it, where you visit a commerce site and then ads for that site follow you elsewhere on the web. Prospecting, by contrast, can have impressive scale (obviously there are more non-visitors than visitors to your website), and that is the main focus of our business. More specifically, we use the retargeting population (site visitors) as a seed set from which to build a classification model for predicting brand affinity in the population of potential prospects.
From a machine learning perspective, we solve a huge sparse classification problem, where features are URLs that people have visited and the class (outcome) is some sort of brand action. Usually we use visits to the brand site as a proxy for brand affinity because it is more common than purchase conversions but less random than clicks. The coolest thing about our system, at least to me, is that it is automatic and dynamic on a number of levels. We re-score browsers regularly as we get new data on them, so the set of good prospects for a particular brand is accurate pretty much in real time, and everything we do -- rescoring browsers, retraining models, and adding new brands -- happens very automatically.
We have a number of papers that describe our algorithms and systems in greater detail, including two that will be presented at KDD in China this year. If you read these, you'll know about as much as you’d care to know about our system.
This is the original paper describing our methods:
Our measurement methodology:
Our large-scale classification system:
Design Principles of Massive, Robust Prediction Systems
And our bid modeling strategy
Bid Optimizing and Inventory Scoring in Targeted Online Advertising
While many companies are very hush hush about their applied ML methods, I find it very useful that m6d actually describes their ML approaches. It helps us to get up-to-date what is happening in the industry.