Comments on Large Scale Machine Learning and Other Animals: KDD CUP 2012 Track 1 using GraphLab..

Thank you both very much for the response!

2012-05-18T15:01:56.704-07:00

Thank you both very much for the response!

Also please ask your questions here: https://group...

2012-05-18T05:58:20.844-07:00

Also please ask your questions here: https://groups.google.com/group/graphlab-kdd
Thanks!!

HI, Can you be more specific, which of the progra...

2012-05-18T05:37:37.675-07:00

HI,
Can you be more specific, which of the program mentioned above gives you the error?

Hi Danny, I tried to follow this blog post, but I...

2012-05-18T04:28:05.843-07:00

Hi Danny,

I tried to follow this blog post, but I keep getting out of memory error. I am running on 12GB RAM.
How big is the memory needed to run this script?

Thank you

Philips

Here is the answer I got from Michael Jahrer from ...

2012-05-17T22:52:20.803-07:00

Here is the answer I got from Michael Jahrer from the commendo team to your question:

Wow 0.38234 is really good for one model.
I assume that this is more than plain SVD.
Yes the user overlap in train and test is very bad, so you have to use other user information sources in order to solve the issue.

My best factor model (SVD) has 0.38591 leaderboard score, but this integrates additional parts: asymmetric info, user sns, user action, user keywords, user tags, user genre and user birthYear info. It has 20 features, more seems to hurt. And i trained it to minimize the rank (like in our 2011 kdd papers) - ranking improves the MAP approx. 0.005. By using all this parts the train/test user coverage is much better - therefore the better score. Plain SVD gives me about 0.346 on leaderboard. An ASVD gives me 0.352, ASVD + user sns part gives me 0.371 leaderboard score.

Thanks Michael!

Hi. I'm using matrix factorization too. (I hav...

2012-05-17T06:42:52.844-07:00

Hi. I'm using matrix factorization too. (I have not tried GraphLab Yet). My current MAP is 0.38234.
I have a question and it is very kind of you if you give your opinion about it.
As ehtsham mentioned, users in the test file who have no data in the training file (81% of users in the test file!) are a big problem for MF models. But this is not the only problem. The most prevalent items in the test file are extremely rare in the training period.
Do you think there is a way to alleviate this problem?

Hi! I answered a week ago but somehow the answer d...

2012-05-15T00:36:25.148-07:00

Hi!
I answered a week ago but somehow the answer did not appear.. :-(
Anyway I suggest using bias-SGD where in case of a missing user in training data the bias of the item is used. In ALS we don't have a good answer on how to predict cold start users.
Maybe try to predict the average for that item.

I meant specifically using the alternating least s...

2012-05-07T22:40:59.082-07:00

I meant specifically using the alternating least squares based approach, which as I understand for prediction requires that you take the dot product of the corresponding user and item vector that you learned in the training phase.

Hi Danny, nice article, I have a question, how wer...

2012-05-07T22:27:37.780-07:00

Hi Danny, nice article, I have a question, how were you able to generate a prediction for each user-item pair in the test file? Because a lot of users/items in the test file do not appear in the training file.