## Monday, March 21, 2011

### Large scale matrix factorization - Yahoo! KDD Cup

This is a great opportunity to get large scale data, for testing matrix factorization algorithms. (Specifically I am using GraphLab Matrix factorization ).

In track1, the task is to predict user music recommendations. There are about 1M users, 600K music titles, and about 252M non-zero ratings in the training data. (To compare to Netflix, there where only 50K users, 500K movies and about 100M non-zero ratings). Unlike Netflix, the ratings are between 0-100 (and not 1-5). In matrix factorization context, need to think how to represent a rating of zero vs. non existing rating.

For a start, I have tested GraphLab matrix factorization data with the track1 dataset. On an 8 core machine (2x4 cores Intel Xeon 2.67 Ghz) each alternating least squares iteration takes around 150 seconds. After 10 iterations I get training RMSE of 21.7 and validation RMSE of 22.5.

For reader's requests, I have posted some more detailed instructions on how to run GraphLab on KDD Cup data here

Here is the output of the GraphLab program:

```<40|0>bickson@bigbro6:~/newgraphlab/graphlabapi/release/demoapps/pmf\$ ./pmf kddcup 0 --ncpus=8 --float=true --zero=true --scheduler="round_robin(max_iterations=10)" --lambda=2 --D=25
Setting run mode ALS_MATRIX
INFO   :pmf.cpp(main:1236): PMF starting

Matrix size is: 1000990 624961 1
Creating 252800275 edges...
Matrix size is: 1000990 624961 1
Creating 4003960 edges...
Matrix size is: 1000990 624961 6649
Creating 6005940 edges...
...............................setting regularization weight to 2
PTF_ALS for matrix (1000990, 624961, 6649):252800275.  D=25
pU=2, pV=2, pT=1, muT=1, D=25
nuAlpha=1, Walpha=1, mu=0, muT=1, nu=25, beta=1, W=1, WT=1 BURN_IN=10
complete. Obj=4.85306e+11, TRAIN RMSE=61.9632 TEST RMSE=75.7155.
INFO   :asynchronous_engine.hpp(run:79): Worker 0 started.

INFO   :asynchronous_engine.hpp(run:79): Worker 1 started.

INFO   :asynchronous_engine.hpp(run:79): Worker 2 started.

INFO   :asynchronous_engine.hpp(run:79): Worker 3 started.

INFO   :asynchronous_engine.hpp(run:79): Worker 4 started.

INFO   :asynchronous_engine.hpp(run:79): Worker 5 started.

INFO   :asynchronous_engine.hpp(run:79): Worker 6 started.

INFO   :asynchronous_engine.hpp(run:79): Worker 7 started.

Entering last iter with 1
161.488) Iter ALS 1  Obj=4.63853e+11, TRAIN RMSE=60.5707 TEST RMSE=26.6540.
Entering last iter with 2
301.86) Iter ALS 2  Obj=9.7782e+10, TRAIN RMSE=27.7971 TEST RMSE=26.0930.
Entering last iter with 3
445.925) Iter ALS 3  Obj=8.91077e+10, TRAIN RMSE=26.5305 TEST RMSE=24.1940.
Entering last iter with 4
585.759) Iter ALS 4  Obj=7.09672e+10, TRAIN RMSE=23.6696 TEST RMSE=23.0499.
Entering last iter with 5
727.565) Iter ALS 5  Obj=6.42765e+10, TRAIN RMSE=22.5229 TEST RMSE=22.6883.
Entering last iter with 6
870.943) Iter ALS 6  Obj=6.17103e+10, TRAIN RMSE=22.0672 TEST RMSE=22.5748.
Entering last iter with 7
1011.14) Iter ALS 7  Obj=6.06398e+10, TRAIN RMSE=21.8743 TEST RMSE=22.5319.
Entering last iter with 8
1153.29) Iter ALS 8  Obj=6.01234e+10, TRAIN RMSE=21.7807 TEST RMSE=22.5117.
Entering last iter with 9
1292.27) Iter ALS 9  Obj=5.98422e+10, TRAIN RMSE=21.7295 TEST RMSE=22.5004.
INFO   :asynchronous_engine.hpp(run:89): Worker 5 finished.

Entering last iter with 10
INFO   :asynchronous_engine.hpp(run:89): Worker 0 finished.

INFO   :asynchronous_engine.hpp(run:89): Worker 1 finished.

INFO   :asynchronous_engine.hpp(run:89): Worker 4 finished.

INFO   :asynchronous_engine.hpp(run:89): Worker 7 finished.

INFO   :asynchronous_engine.hpp(run:89): Worker 2 finished.

INFO   :asynchronous_engine.hpp(run:89): Worker 6 finished.

1430.98) Iter ALS 10  Obj=5.96712e+10, TRAIN RMSE=21.6983 TEST RMSE=22.4935.
INFO   :asynchronous_engine.hpp(run:89): Worker 3 finished.

Final result. Obj=5.96743e+10, TRAIN RMSE= 21.6989 TEST RMSE= 22.4935.
Exporting KDD cup test graph: kddcupt.kdd.out
**Completed successfully (mean prediction: 64.919045)**
Finished in 1437.142338
Counters are: 0) EDGE_TRAVERSAL, 6038.63
Counters are: 1) BPTF_SAMPLE_STEP, 0
Counters are: 2) CALC_RMSE_Q, 0.044609
Counters are: 3) ALS_LEAST_SQUARES, 3258.77
Counters are: 4) NA, 0
Counters are: 5) BPTF_TIME_EDGES, 0
Counters are: 6) BPTF_LEAST_SQUARES, 0
Counters are: 7) CALC_OBJ, 3.51506
Counters are: 8) NA, 0
Counters are: 9) BPTF_MVN_RNDEX, 0
Counters are: 10) BPTF_LEAST_SQUARES2, 0

=== REPORT FOR core ===
[Numeric]
ncpus:          8
[Other]
affinities:     false
compile_flags:  -O3 -Wall -g -mfpmath=sse -msse2 -funroll-loops -fprefetch-loop-arrays -fopenmp
engine: async
scheduler:      round_robin
schedyield:     true
scope:  edge

=== REPORT FOR engine ===
[Numeric]
num_edges:              2.528e+08
num_syncs:              0
num_vertices:           1.62595e+06
updatecount:            1.62595e+07     (count: 8, min: 1.97158e+06, max: 2.11913e+06, avg: 2.03244e+06)
[Timings]
runtime:                1412.8 s
[Other]
[Numeric]
updatecount_vector:             1.62595e+07     (count: 8, min: 1.97158e+06, max: 2.11913e+06, avg: 2.03244e+06)
updatecount_vector.values:              1.99981e+06,2.01133e+06,1.97158e+06,2.08794e+06,2.00821e+06,2.11913e+06,2.02736e+06,2.03415e+06,
```

I wrote a simple Matlab script to parse the text input files into sparse Matlab matrix. Here it is:
```%Written by Danny Bickson, CMU, March 211
%This script reads KDD cup track1 format and saves
%it into a sparse Matlab matrix
nUsers=1000990;
nItems=624961;
nRatings=262810175;
nTrainRatings=252800275;
nProbeRatings=4003960;
nTestRatings=6005940;

runmode=1; % 1 for training, 2 for probe, 3 for test

switch runmode
case 1
filename='/path/to/trainIdx1.txt';
ratings=nTrainRatings;
outfile='/path/to/train1.mat';
case 2
filename='/path/to/validationIdx1.txt';
ratings=nProbeRatings;
outfile='/path/to/probe1.mat';
case 3
filename='/path/to/testIdx1.txt';
ratings=nTestRatings;
outfile='/path/to/test1.mat';
end
rows=zeros(ratings,1);
cols=rows;
vals=rows;
try

ff=fopen(filename,'r');
cnt=1;
for j=1:nUsers
% read user id and number of ratings
[a,num]=fscanf(ff,'%d|%d',2);
assert(num==2);

user=a(1);
numofratings=a(2);
if (mod(j,1000)==0)
disp(['user: ', num2str(user),' ratings: ', num2str(numofratings)]);
end
if (runmode==3)
assert(numofratings==6); % 6 ratings per each user in test file
else if (runmode == 2)
assert(numofratings==4); % 4 ratings per user in probe file
end

for i=1:numofratings % for each ratings
b=-100;
if (runmode<=2)
[b,num]=fscanf(ff,'%d %d %d %d:%d:%d',6);
assert(num==6);
else
[b,num]=fscanf(ff,'%d %d %d:%d:%d',5);
assert(num==5);
end
rows(cnt)=user+1;  % user
assert(rows(cnt)>=1 && rows(cnt)<=nUsers);
cols(cnt)=b(1)+1; % item
assert(cols(cnt)>=1 && cols(cnt)<=nItems);

if (runmode<=2)
vals(cnt)=b(2); % rating
assert(vals(cnt)>=0 && vals(cnt)<=100);
else
vals(cnt)=1; % placeholder for missing rating
end
cnt=cnt+1;
end
end

assert(cnt==ratings+1);

catch ex
disp(['exception on ', num2str(i), ' ', num2str(j)]);
ex
end

% construct and save a sparse Matlab matrix
A=sparse(double(rows), double(cols), double(vals));
save(outfile,'-v7.3','A');
```

1. I wanted to comment in your kdd-cup group post but gave me an error :(
- thanks for the script ;) I'm gonna try it as soon as download the data at home, can I ask u if you're using matlab only to do the whole game or are u using also mexfiles with c++ , openMP ... ?

2. Hi!
Actually I am using Graphlab, and I am getting an RMSE of 22.3 for the validation data using alternating least squares. Write me if you like to get directions on how to setup GraphLab to run on KDDCUP.

Best,

DB

3. Hi,

Did you submit your prediction result for the Test data of KDDCup 2011? Is the Test result much different from Validation result?

Bests,
Thanh

4. Yes, the test is about +2 points higher in RMSE.
For alternating least squares I got 23.97. Now I have implemented SVD++ and I will try it soon. I heard from people that SVD++ got RMSE 23.5.

5. connat 23.9727
Idealists 23.9768
TEST 23.9768

There are 3 contestants with this score. Which one is you? Or is it BiXolVer? I'm also curious about SVD++ score.

6. Hi,
Yes, it is TEST.
I tried SVD++, with number of features = 20, without any tuning for different params, for 15 iterations, on 8 cores machine and got 24.6
but I am pretty sure this can be improved when tuning learning rate, step size, etc.

One of the groups combined my ALS score with their SVM and get -0.2 RMSE (total 23.2)