Monday, April 23, 2012

KDD CUP Update

As the contest is heating up I am getting more and more interesting feedback. As you can see in the above image, thanks to JustinYan, we are now at place 5 in track 2.

I have significantly extended the parser library in GraphLab version 2 to create useful tools for anyone who wants to compete. Here are some of the currently available tools:

kdd_train_data_parser - clean the training data from opposite signals, and split it into meaningful validation and training sets.
kdd_test_splitter - split the test data into two test files, as instructed by the contest
kddcup_output_builder - take two vector prediction files (output of graphlab or any other software like Vowpal Wabbit), find the highest prediction files, and merge them back into a zip submission file containing the solution.
kdd_usr_itm_feature_parsers - parse user and item features and translate them to GraphLab or VW input formats.
kdd_linear_model_builder - build a linear feature model for the training/validation that can be used for classification.

Since we don't have much time to perfect everything before the code release, I would love to get any help/feedback/suggestions from you!!

Some more detailed instructions:

GraphLab v2 setup
1) install graphlab v2, USING MERCURIAL option as instructed here:
2) After checking out, you should do:
hg pull
hg update v2
cd release

Example runs:
More details are found here.

Let me know how it went!

No comments:

Post a Comment