As I am getting more questions about using GraphLab SVD I thought about elaborating some more.
This is an email I got from Ehtsham Elahi, from a startup called http://change.org
Hi Danny, thank you for your great help, it is working now on mac os :) and performance is much much better than Mahout, I ll send the detailed benchmark... 
I have talked to my Manager and he agrees that it will be cool to have our Mahout vs Graphlab results getting published . The organization is change.org and the matrix over which we have been testing both Mahout and Graphlab is 5Million by 376 column matrix. Approximately 70 million out of the total possible values are non-zero.
Best,
Ehtsham
Another nice feedback from Nihar Sharma, Caltech:
 I am using GraphLab for a project course in ML here at Caltech and I am excited to see what kind of results it has on our dataset. Separately, I am also working on classification problems with the Astronomy department here and will definitely test GraphLab and let you know how it does on those datasets.
-Nihar
Here are some common questions I am getting:
1) How to read GraphLab SVD output?
If you are using matrix market input format, there will be 4 output files
- datasetname.V
- datasetname.U
- datasetname.EigenValues_AAT
- datasetname.EigenValues_ATA
The matrix A ~= U* diag(EigenValues_AAT) * V'
Alternatively, the matrix A ~= U * diag(EigenValues_ATA) * V'
In theory eigenvalues of AAT and ATA should be identical but because of numerical error there may be some differences
especially for the smaller eigenvalues.
If you are using itpp output format, then you should load your output file using itload('outputfilename') in matlab/octave
and then you will see the variables U, V, EigenValues_AAT, EigenValues_ATA. For example, after running 5 netflix
iterations you will see:
>> itload('netflix-20-1.out');
>> whos
  Name                     Size              Bytes  Class     Attributes
  EigenValues_AAT          6x1                  48  double              
  EigenValues_ATA          6x1                  48  double              
  V                     3561x6              170928  double              
  U                    95526x6             4585248  double              
2) How to compute predictions using GraphLab SVD?
The way to compute predictions using SVD, is by taking the matching row of U, the matching column of V and multiply the product with the
matching diagonal. Here is a matlab example:
>> A=rand(3,4);
>> [u,d,v]=svd(A*A');
>> A*A'
ans =
    1.0738    0.7018    0.8807
    0.7018    1.1564    0.8954
    0.8807    0.8954    1.0851
>> u*d*v'
ans =
    1.0738    0.7018    0.8807
    0.7018    1.1564    0.8954
    0.8807    0.8954    1.0851
And now you compute predictions as follows. Assume you want the first row and first column of A*A'. In this case you do:
>> u(1,:)*(diag(d).*v(1,:)')
ans =
    1.0738
3) What is your recommended matrix factorization method for collaborative filtering?
the best algorithm we found to perform on KDD CUP data was time-SVD++ - I am going to implement it soon in Graphlab so you could try it out as well. See our workshop paper, pointed from here: http://bickson.blogspot.com/2011/08/efficient-multicore-collaborative.html
 
 
No comments:
Post a Comment