Implementation is based on the paper: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model by Yehuda Koren.
Note that unlike the original paper, our implementation is parallel, thus exploiting multiple cores whenever they are available.
Here are some timing results of the multicore implemenation:
(I used 8 core machine, 5 SVD++ iterations using Netflix partial dataset with 3M ratings)
Here are some accuracy results:
It seems that additional cores improve accuracy.
The way to run SVD++ is to
0) Istall GraphLab based on the instructions: http://graphlab.org/download.html
1) Run with run mode = 5
Example:
./pmf netflix 5 --ncpus=XX --scheduler="round_robin(max_iterations=10)"
Two other options are --minval=XX and --maxval=XX
for kddcup, it should be --minval=0 and --maxval=100
(if file name is kddcup it will automatically set those values).
For Netflix data, it should be --minval=1 and --maxval=5
Example run on the full Netflix dataset (using 8 cores:)
<55|0>bickson@biggerbro:~/newgraphlab/graphlabapi/release/demoapps/pmf$ ./pmf netflix-r 5 --ncpus=8 --scheduler=round_robin INFO: pmf.cpp(main:1081): PMF/ALS Code written By Danny Bickson, CMU Send bug reports and comments to danny.bickson@gmail.com WARNING: pmf.cpp(main:1083): Code compiled with GL_NO_MULT_EDGES flag - this mode does not support multiple edges between user and movie in different times WARNING: pmf.cpp(main:1086): Code compiled with GL_NO_MCMC flag - this mode does not support MCMC methods. WARNING: pmf.cpp(main:1089): Code compiled with GL_SVD_PP flag - this mode only supports SVD++ run. Setting run mode SVD_PLUS_PLUS INFO: pmf.cpp(main:1126): SVD_PLUS_PLUS starting loading data file netflix-r Loading netflix-r TRAINING Matrix size is: USERS 480189 MOVIES 17770 TIME BINS 27 Creating 99072112 edges (observed ratings)... ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................loading data file netflix-re Loading netflix-re VALIDATION Matrix size is: USERS 480189 MOVIES 17770 TIME BINS 27 Creating 1408395 edges (observed ratings)... ........loading data file netflix-rt Loading netflix-rt TEST skipping file setting regularization weight to 1 PTF_ALS for matrix (480189, 17770, 27):99072112. D=20 pU=1, pV=1, pT=1, muT=1, D=20 nuAlpha=1, Walpha=1, mu=0, muT=1, nu=20, beta=1, W=1, WT=1 BURN_IN=10 SVD++ 20 factors (rate=8.00e-03, reg=1.50e-02) complete. Obj=6.8368e+08, TRAIN RMSE=3.7150 VALIDATION RMSE=3.7946. max iterations = 0 step = 1 max_iterations = 0 INFO: asynchronous_engine.hpp(run:94): Worker 0 started. INFO: asynchronous_engine.hpp(run:94): Worker 2 started. INFO: asynchronous_engine.hpp(run:94): Worker 1 started. INFO: asynchronous_engine.hpp(run:94): Worker 3 started. INFO: asynchronous_engine.hpp(run:94): Worker 4 started. INFO: asynchronous_engine.hpp(run:94): Worker 5 started. INFO: asynchronous_engine.hpp(run:94): Worker 6 started. INFO: asynchronous_engine.hpp(run:94): Worker 7 started. Entering last iter with 1 92.7115) Iter SVD 1, TRAIN RMSE=1.0587 VALIDATION RMSE=0.9892. Entering last iter with 2 174.441) Iter SVD 2, TRAIN RMSE=0.9096 VALIDATION RMSE=0.9536. Entering last iter with 3 260.442) Iter SVD 3, TRAIN RMSE=0.8678 VALIDATION RMSE=0.9805. Entering last iter with 4 321.652) Iter SVD 4, TRAIN RMSE=0.8480 VALIDATION RMSE=0.9603. Entering last iter with 5 388.735) Iter SVD 5, TRAIN RMSE=0.8291 VALIDATION RMSE=0.9312. Entering last iter with 6 470.291) Iter SVD 6, TRAIN RMSE=0.8106 VALIDATION RMSE=0.9264. Entering last iter with 7 558.886) Iter SVD 7, TRAIN RMSE=0.8046 VALIDATION RMSE=0.9270. Entering last iter with 8 628.846) Iter SVD 8, TRAIN RMSE=0.8007 VALIDATION RMSE=0.9242. Entering last iter with 9 687.212) Iter SVD 9, TRAIN RMSE=0.7969 VALIDATION RMSE=0.9221. Entering last iter with 10 775.021) Iter SVD 10, TRAIN RMSE=0.7926 VALIDATION RMSE=0.9215. Entering last iter with 11 836.143) Iter SVD 11, TRAIN RMSE=0.7907 VALIDATION RMSE=0.9203. Entering last iter with 12 919.416) Iter SVD 12, TRAIN RMSE=0.7874 VALIDATION RMSE=0.9195. Entering last iter with 13 1000.87) Iter SVD 13, TRAIN RMSE=0.7852 VALIDATION RMSE=0.9191. Entering last iter with 14 1081.9) Iter SVD 14, TRAIN RMSE=0.7834 VALIDATION RMSE=0.9186. Entering last iter with 15 1169.46) Iter SVD 15, TRAIN RMSE=0.7817 VALIDATION RMSE=0.9182. Entering last iter with 16 1236.61) Iter SVD 16, TRAIN RMSE=0.7808 VALIDATION RMSE=0.9179. Entering last iter with 17 1304.72) Iter SVD 17, TRAIN RMSE=0.7795 VALIDATION RMSE=0.9176. Entering last iter with 18 1366.15) Iter SVD 18, TRAIN RMSE=0.7783 VALIDATION RMSE=0.9173. Entering last iter with 19 1453.8) Iter SVD 19, TRAIN RMSE=0.7768 VALIDATION RMSE=0.9172. Entering last iter with 20 1521.15) Iter SVD 20, TRAIN RMSE=0.7763 VALIDATION RMSE=0.9171. Entering last iter with 21 1588.85) Iter SVD 21, TRAIN RMSE=0.7754 VALIDATION RMSE=0.9175. Entering last iter with 22 1654.52) Iter SVD 22, TRAIN RMSE=0.7757 VALIDATION RMSE=0.9170. Entering last iter with 23 1722.88) Iter SVD 23, TRAIN RMSE=0.7740 VALIDATION RMSE=0.9171. Entering last iter with 24 1783.94) Iter SVD 24, TRAIN RMSE=0.7739 VALIDATION RMSE=0.9163.
Hi Bickson,
ReplyDeleteThe original svd++ (Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model) does not take time into consideration. Does your implementation also include this paper: Collaborative Filtering with Temporal Dynamics? I want to use this software for track2, which does not have time information.
Thanks,
Carlos
Hi Danny,
ReplyDeleteAfter running for a while and ending with a TRAIN RMSE=0.7235, over 99% of new predictions using the sum(User .* Movie) formula are negative. (minvalue and maxvalue were set to 1 and 5)
Is this not the way to calculate predictions for SVDpp?
Can you send me your input file and I will look at it?
ReplyDeletep.s.
ReplyDeleteIt is not the way to compute predictions for SVD++... I will send you more details instructions on how to do it.