Some explanation about the algorithm parameters:
1) run mode should be set to 16
2) RBM assumes the rating is binary. Namely for Netflix data, rating is between 1 to 5, so we have 6 bins (0,1,2,3,4,5). For KDD CUP data, rating is between 0 -> 100. To save memory, we can scale it by 10 to have 11 bins. --rbm_scaling - tells the program how much to scale the bins.
--rbm_bins - tells the program how many bins there are.
3) RBM is a gradient descent type algorithm. --rbm_alpha is the step size, and --rbm_beta is the regularization parameter. --rbm_mult_step_dec tells the program how much to decrease the step size at each iteration.
Example run:
./pmf smallnetflix_mm 16 --matrixmarket=true --scheduler="round_robin(max_iterations=10,block_size=1)" --rbm_scaling=1 --rbm_bins=6 --rbm_alpha=0.06 --rbm_beta=.1 --ncpus=8 --minval=1 --maxval=5 --rbm_mult_step_dec=0.8 INFO: pmf.cpp(do_main:430): PMF/BPTF/ALS/SVD++/time-SVD++/SGD/Lanczos/SVD Code written By Danny Bickson, CMU Send bug reports and comments to danny.bickson@gmail.com WARNING: pmf.cpp(do_main:434): Program compiled with Eigen Support Setting run mode RBM (Restriced Bolzman Machines) INFO: pmf.cpp(start:306): RBM (Restriced Bolzman Machines) starting loading data file smallnetflix_mm Loading Matrix Market file smallnetflix_mm TRAINING Loading smallnetflix_mm TRAINING Matrix size is: USERS 95526 MOVIES 3561 TIME BINS 1 INFO: read_matrix_market.hpp(load_matrix_market:131): Loaded total edges: 3298163 loading data file smallnetflix_mme Loading Matrix Market file smallnetflix_mme VALIDATION Loading smallnetflix_mme VALIDATION Matrix size is: USERS 95526 MOVIES 3561 TIME BINS 1 INFO: read_matrix_market.hpp(load_matrix_market:131): Loaded total edges: 545177 loading data file smallnetflix_mmt Loading Matrix Market file smallnetflix_mmt TEST Loading smallnetflix_mmt TEST skipping file RBM (Restriced Bolzman Machines) for matrix (95526, 3561, 1):3298163. D=20 INFO: rbm.hpp(rbm_init:424): RBM initialization ok complete. Objective=8.37956e-304, TRAIN RMSE=0.0000 VALIDATION RMSE=0.0000. INFO: pmf.cpp(run_graphlab:251): starting with scheduler: round_robin max iterations = 10 step = 1 Entering last iter with 1 5.99073) Iter RBM 1, TRAIN RMSE=0.9242 VALIDATION RMSE=0.9762. Entering last iter with 2 11.0763) Iter RBM 2, TRAIN RMSE=0.9109 VALIDATION RMSE=0.9673. Entering last iter with 3 16.1259) Iter RBM 3, TRAIN RMSE=0.9054 VALIDATION RMSE=0.9633. Entering last iter with 4 21.2074) Iter RBM 4, TRAIN RMSE=0.9015 VALIDATION RMSE=0.9600. Entering last iter with 5 26.3222) Iter RBM 5, TRAIN RMSE=0.8986 VALIDATION RMSE=0.9560. Entering last iter with 6 31.409) Iter RBM 6, TRAIN RMSE=0.8960 VALIDATION RMSE=0.9540. Entering last iter with 7 36.4693) Iter RBM 7, TRAIN RMSE=0.8941 VALIDATION RMSE=0.9508. ...Let me know if you try it out!
Hi
ReplyDeleteWhere can I find the code??
Hi Zara,
DeleteRBM code is available as part of GraphChi collaborative filtering toolkit, see here: http://bickson.blogspot.co.il/2012/12/collaborative-filtering-with-graphchi.html