Large Scale Machine Learning and Other Animals: More on GraphLab SVD

Tuesday, November 8, 2011

More on GraphLab SVD

As I am getting more questions about using GraphLab SVD I thought about elaborating some more.

This is an email I got from Ehtsham Elahi, from a startup called http://change.org

Hi Danny, thank you for your great help, it is working now on mac os :) and performance is much much better than Mahout, I ll send the detailed benchmark...
I have talked to my Manager and he agrees that it will be cool to have our Mahout vs Graphlab results getting published . The organization is change.org and the matrix over which we have been testing both Mahout and Graphlab is 5Million by 376 column matrix. Approximately 70 million out of the total possible values are non-zero.

Best,
Ehtsham

Another nice feedback from Nihar Sharma, Caltech:
I am using GraphLab for a project course in ML here at Caltech and I am excited to see what kind of results it has on our dataset. Separately, I am also working on classification problems with the Astronomy department here and will definitely test GraphLab and let you know how it does on those datasets.

-Nihar

Here are some common questions I am getting:
1) How to read GraphLab SVD output?
If you are using matrix market input format, there will be 4 output files
- datasetname.V
- datasetname.U
- datasetname.EigenValues_AAT
- datasetname.EigenValues_ATA

The matrix A ~= U* diag(EigenValues_AAT) * V'
Alternatively, the matrix A ~= U * diag(EigenValues_ATA) * V'

In theory eigenvalues of AAT and ATA should be identical but because of numerical error there may be some differences
especially for the smaller eigenvalues.

If you are using itpp output format, then you should load your output file using itload('outputfilename') in matlab/octave
and then you will see the variables U, V, EigenValues_AAT, EigenValues_ATA. For example, after running 5 netflix
iterations you will see:
>> itload('netflix-20-1.out');
>> whos
Name Size Bytes Class Attributes

EigenValues_AAT 6x1 48 double
EigenValues_ATA 6x1 48 double
V 3561x6 170928 double
U 95526x6 4585248 double

2) How to compute predictions using GraphLab SVD?
The way to compute predictions using SVD, is by taking the matching row of U, the matching column of V and multiply the product with the
matching diagonal. Here is a matlab example:
>> A=rand(3,4);
>> [u,d,v]=svd(A*A');
>> A*A'

ans =

1.0738 0.7018 0.8807
0.7018 1.1564 0.8954
0.8807 0.8954 1.0851

>> u*d*v'

ans =

1.0738 0.7018 0.8807
0.7018 1.1564 0.8954
0.8807 0.8954 1.0851

And now you compute predictions as follows. Assume you want the first row and first column of A*A'. In this case you do:
>> u(1,:)*(diag(d).*v(1,:)')

ans =

1.0738

3) What is your recommended matrix factorization method for collaborative filtering?
the best algorithm we found to perform on KDD CUP data was time-SVD++ - I am going to implement it soon in Graphlab so you could try it out as well. See our workshop paper, pointed from here: http://bickson.blogspot.com/2011/08/efficient-multicore-collaborative.html

Large Scale Machine Learning and Other Animals

Tuesday, November 8, 2011

More on GraphLab SVD

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax