Friday, May 11, 2012

RBM (Restricted Bolzman Machines) in GraphLab

I am glad to announce I have added an efficient multiple implementation of restricted Bolazman machines (RBM) algorithm. The algorithm is described in Hinton's paper. The code is based on an excellent C code by my collaborator JustinYan. Who by the way is still looking for a US based internship!

Some explanation about the algorithm parameters:

1) run mode should be set to 16

 2) RBM assumes the rating is binary. Namely for Netflix data, rating is between 1 to 5, so we have 6 bins (0,1,2,3,4,5). For KDD CUP data, rating is between 0 -> 100. To save memory, we can scale it by 10 to have 11 bins. --rbm_scaling - tells the program how much to scale the bins. 
--rbm_bins - tells the program how many bins there are.

 3) RBM is a gradient descent type algorithm. --rbm_alpha is the step size, and --rbm_beta is the regularization parameter. --rbm_mult_step_dec tells the program how much to decrease the step size at each iteration.

 Example run:
./pmf smallnetflix_mm 16 --matrixmarket=true --scheduler="round_robin(max_iterations=10,block_size=1)" --rbm_scaling=1 --rbm_bins=6 --rbm_alpha=0.06 --rbm_beta=.1 --ncpus=8 --minval=1 --maxval=5 --rbm_mult_step_dec=0.8

INFO:     pmf.cpp(do_main:430): PMF/BPTF/ALS/SVD++/time-SVD++/SGD/Lanczos/SVD Code written By Danny Bickson, CMU
Send bug reports and comments to danny.bickson@gmail.com
WARNING:  pmf.cpp(do_main:434): Program compiled with Eigen Support
Setting run mode RBM (Restriced Bolzman Machines)
INFO:     pmf.cpp(start:306): RBM (Restriced Bolzman Machines) starting

loading data file smallnetflix_mm
Loading Matrix Market file smallnetflix_mm TRAINING
Loading smallnetflix_mm TRAINING
Matrix size is: USERS 95526 MOVIES 3561 TIME BINS 1
INFO:     read_matrix_market.hpp(load_matrix_market:131): Loaded total edges: 3298163
loading data file smallnetflix_mme
Loading Matrix Market file smallnetflix_mme VALIDATION
Loading smallnetflix_mme VALIDATION
Matrix size is: USERS 95526 MOVIES 3561 TIME BINS 1
INFO:     read_matrix_market.hpp(load_matrix_market:131): Loaded total edges: 545177
loading data file smallnetflix_mmt
Loading Matrix Market file smallnetflix_mmt TEST
Loading smallnetflix_mmt TEST
skipping file
RBM (Restriced Bolzman Machines) for matrix (95526, 3561, 1):3298163.  D=20
INFO:     rbm.hpp(rbm_init:424): RBM initialization ok
complete. Objective=8.37956e-304, TRAIN RMSE=0.0000 VALIDATION RMSE=0.0000.
INFO:     pmf.cpp(run_graphlab:251): starting with scheduler: round_robin
max iterations = 10
step = 1
Entering last iter with 1
5.99073) Iter RBM 1, TRAIN RMSE=0.9242 VALIDATION RMSE=0.9762.
Entering last iter with 2
11.0763) Iter RBM 2, TRAIN RMSE=0.9109 VALIDATION RMSE=0.9673.
Entering last iter with 3
16.1259) Iter RBM 3, TRAIN RMSE=0.9054 VALIDATION RMSE=0.9633.
Entering last iter with 4
21.2074) Iter RBM 4, TRAIN RMSE=0.9015 VALIDATION RMSE=0.9600.
Entering last iter with 5
26.3222) Iter RBM 5, TRAIN RMSE=0.8986 VALIDATION RMSE=0.9560.
Entering last iter with 6
31.409) Iter RBM 6, TRAIN RMSE=0.8960 VALIDATION RMSE=0.9540.
Entering last iter with 7
36.4693) Iter RBM 7, TRAIN RMSE=0.8941 VALIDATION RMSE=0.9508.
...
Let me know if you try it out!

2 comments:

  1. Replies
    1. Hi Zara,
      RBM code is available as part of GraphChi collaborative filtering toolkit, see here: http://bickson.blogspot.co.il/2012/12/collaborative-filtering-with-graphchi.html

      Delete