## Monday, December 3, 2012

### Collaborative filtering with GraphChi

A couple of weeks ago I covered GraphChi by Aapo Kyrola in my blog.
Here is a quick tutorial for trying out GraphChi collaborative filtering toolbox that I wrote. Currently it supports ALS (alternating least squares), SGD (stochastic gradient descent), bias-SGD (biased stochastic gradient descent) , SVD++ , NMF (non-negative matrix factorization), SVD (restarted lanczos, and one sided lanczos), RBM (restricted Bolzman machines), FM (factorization machines) and time-SVD++, CLiMF (collaborative less is more filtering).  I am soon going to implement several more algorithms.

## References

Here are papers which explain the algorithms in more detail:

• Alternating Least Squares (ALS)
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management. Shanghai, China pp. 337-348, 2008.

• Alternating Least Squares (ALS) - parallel coordinate descent (a.k.a. CCD++)
H.-F. Yu, C.-J. Hsieh, S. Si, I. S. Dhillon, Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems. IEEE International Conference on Data Mining(ICDM), December 2012.
Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, and Lars Schmidt-Thieme. Fast context-aware recommendations with factorization machines. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR '11). ACM, New York, NY, USA, 635-644.

• Stochastic gradient descent (SGD)
 Matrix Factorization Techniques for Recommender Systems Yehuda Koren, Robert Bell, Chris Volinsky In IEEE Computer, Vol. 42, No. 8. (07 August 2009), pp. 30-37.
Takács, G, Pilászy, I., Németh, B. and Tikk, D. (2009). Scalable Collaborative Filtering Approaches for Large Recommender Systems. Journal of Machine Learning Research, 10, 623-656.

• Bias stochastic gradient descent (Bias-SGD)
Y. Koren. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. ACM SIGKDD 2008. Equation (5).
• SVD++
Y. Koren. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. ACM SIGKDD 2008.
• Weighted-ALS
Collaborative Filtering for Implicit Feedback Datasets Hu, Y.; Koren, Y.; Volinsky, C. IEEE International Conference on Data Mining (ICDM 2008), IEEE (2008).
• Sparse-ALS
Xi Chen, Yanjun Qi, Bing Bai, Qihang Lin and Jaime Carbonell. Sparse Latent Semantic Analysis. In SIAM International Conference on Data Mining (SDM), 2011.
D. Needell, J. A. Tropp CoSaMP: Iterative signal recovery from incomplete and inaccurate samples Applied and Computational Harmonic Analysis, Vol. 26, No. 3. (17 Apr 2008), pp. 301-321. 
• NMF
Lee, D..D., and Seung, H.S., (2001), 'Algorithms for Non-negative Matrix
Factorization', Adv. Neural Info. Proc. Syst. 13, 556-562.
• SVD (Restarted Lanczos & One sided Lanczos)
V. Hern´andez, J. E. Rom´an and A. Tom´as. STR-8: Restarted Lanczos Bidiagonalization for the SVD in SLEPc.

• tensor-ALS
Tensor Decompositions, Alternating Least Squares and other Tales. P. Comon, X. Luciani and A. L. F. de Almeida. Special issue, Journal of Chemometrics. In memory of R. Harshman.
August 16, 2009
• Restricted Bolzman Machines (RBM)
G. Hinton. A Practical Guide to Training Restricted Boltzmann Machines. University of Toronto Tech report UTML TR 2010-003.
• time-svd++
Yehuda Koren. 2009. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 447-456. DOI=10.1145/1557019.1557072
• libFM
Steffen Rendle (2010): Factorization Machines, in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), Sydney, Australia.

• PMF
Salakhutdinov and Mnih, Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo. in International Conference on Machine Learning, 2008.

• CLiMF
 CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering. Yue Shi, Martha Larson, Alexandros Karatzoglou, Nuria Oliver, Linas Baltrunas, Alan Hanjalic, Sixth ACM Conference on Recommender Systems, RecSys '12.


## Target

The benefit of using GraphChi is that it requires a single multicore machine and can scale up to very large models, since at no point the data is fully read into memory. In other words, GraphChi is very useful for machine with limited RAM since it streams over the dataset. It is also possible to configure how much RAM to use during the run.

Here are some performance numbers:

The above graph shows 6 iterations of SGD (stochastic gradient descent) on the full Netflix data.
Netflix has around 100M ratings so the matrix has 100M non-zeros. The size of the decomposed
matrix is about 480K users x 10K movies. I used a single multicore machine with 8 threads, where GraphChi memory consumption was limited to 800Mb, using 8 cores. The factorized matrix has a width of D=20. In total it takes around 80 seconds per 6 iterations, which is around 14 seconds per iteration.

Preprocessing the matrix is done once, and take around 35 seconds.

The input to GraphChi ALS/SGD/bias-SGD is the sparse matrix A in sparse matrix market format. The output are two matrices U and V s.t. A ~= U*V'  and both U and V have
a lower dimension D.

## Running and setup instructions

1) Download graphchi from Github using the instructions here.

2) Change directory to graphchi
> cd graphchi

3) Install graphchi
> bash install.sh

4a) For ALS/SGD/bias-SGD/SVD++/SVD Download Netflix synthetic sample file. The input is in sparse matrix market format.
wget http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm
wget http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mme

4b) For WALS Download netflix sample file including time:

wget http://www.select.cs.cmu.edu/code/graphlab/datasets/time_smallnetflix
wget http://www.select.cs.cmu.edu/code/graphlab/datasets/time_smallnetflixe

5) Run baseline methods on Netflix example:
./toolkits/collaborative_filtering/baseline --training=smallnetflix_mm --validation=smallnetflix_mm --minval=1 --maxval=5 --quiet=1  --algorithm=user_mean

5a) Run ALS on the Netflix example:
./toolkits/collaborative_filtering/als --training=smallnetflix_mm --validation=smallnetflix_mme --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1

At the first time, the input file will be preprocessed into an efficient binary representation on disk and then 6 ALS iterations will be run.

5b) Run CCD++ on the Netflix example:
./toolkits/collaborative_filtering/als_coord --training=smallnetflix_mm --validation=smallnetflix_mme --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1

5c) Run SGD on the Netflix example:
./toolkits/collaborative_filtering/sgd --training=smallnetflix_mm --validation=smallnetflix_mme --sgd_lambda=1e-4 --sgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1

5c) Run bias-SGD on the Netflix example:
./toolkits/collaborative_filtering/biassgd --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1

5d) Run SVD++ on Netflix example:
./toolkits/collaborative_filtering/svdpp --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1

5e) Run weighted-ALS on the Netflix time example:
./toolkits/collaborative_filtering/wals --training=time_smallnetflix --validation=time_smallnetflixe --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1

5f) Run NMF on the reverse Netflix example:
./toolkits/collaborative_filtering/nmf --training=reverse_netflix.mm --minval=1 --maxval=5 --max_iter=20 --quiet=1

5g) Run SVD and one sided SVD on the Netflix example:
./toolkits/collaborative_filtering/svd --training=smallnetflix_mm --nsv=3 --nv=10 --max_iter=5 --quiet=1 --tol=1e-1
./toolkits/collaborative_filtering/svd_onesided --training=smallnetflix_mm --nsv=3 --nv=10 --max_iter=5 --quiet=1 --tol=1e-1

5h) Run tensor-ALS on Netflix time example
./toolkits/collaborative_filtering/als_tensor --training=time_smallnetflix --validation=time_smallnetflixe --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --K=27 --quiet=1

5i) Run RBM on time_smallnetflix data using the command:
./toolkits/collaborative_filtering/rbm --training=smallnetflix_mm --validation=smallnetflix_mme --minval=1 --maxval=5 --max_iter=6 --quiet=1

5j) Run time-svd++ on  time_smallnetflix data:
./toolkits/collaborative_filtering/timesvdpp --training=time_smallnetflix --validation=time_smallnetflixe --minval=1 --maxval=5 --max_iter=6 --quiet=1

5k) Run libFM on time_smallnetflix
./toolkits/collaborative_filtering/libfm --training=time_smallnetflix --validation=time_smallnetflixe --minval=1 --maxval=5 --max_iter=6 --quiet=1

5l) Run PMF on smallnetflix_mm data:
./toolkits/collaborative_filtering/pmf --training=smallnetflix_mm --quiet=1 --minval=1 --maxval=5 --max_iter=10 --pmf_burn_in=5

5m) Run Bias-SGD2 on smallnetflix_mm data:

./toolkits/collaborative_filtering/biassgd2 --training=smallnetflix_mm --minval=1 --maxval=5 --validation=smallnetflix_mme --biassgd_gamma=1e-2 --biassgd_lambda=1e-2 --max_iter=10 --quiet=1 --loss=logistic --biassgd_step_dec=0.99999
./toolkits/collaborative_filtering/biassgd2 --training=smallnetflix_mm --minval=1 --maxval=5 --validation=smallnetflix_mme --biassgd_gamma=1e-2 --biassgd_lambda=1e-2 --max_iter=10 --quiet=1 --loss=abs --biassgd_step_dec=0.99999
./toolkits/collaborative_filtering/biassgd2 --training=smallnetflix_mm --minval=1 --maxval=5 --validation=smallnetflix_mme --biassgd_gamma=1e-2 --biassgd_lambda=1e-2 --max_iter=10 --quiet=1 --loss=square --biassgd_step_dec=0.99999

5n) Run CLiMF on the netflix data:
./toolkits/collaborative_filtering/climf --training=smallnetflix_mm --validation=smallnetflix_mme --binary_relevance_thresh=4 --sgd_gamma=1e-6 --max_iter=6 --quiet=1 --sgd_step_dec=0.9999 --sgd_lambda=1e-6
6) View the output.

#### For ALS, CCD++, SGD, bias-SGD, WALS, SVD++ , RBM, CLiMF and NMF

Two files are created: filename_U.mm and filename_V.mm
The files store the matrices U and V in dense matrix market format.

%%MatrixMarket matrix array real general
95526 5
0.693370
1.740420
0.947675
1.328987
1.150084
1.399164
1.292951
0.300416

#### For tensor-ALS, time-SVD++

Additional output file named filename_T.mm is created. Prediction is computed as the tensor product of U_i * V_j * T_k (namely r_ijk = sum_l( u_il * v_jl * t_kl )).

#### For bias-SGD, SVD++, time-SVD++

Additional three files are created: filename_U_bias.mm, filename_V_bias.mm and filename_global_mean.mm. Bias files include the bias for each user (U) and item (V).
The global mean file includes the global mean of the rating.

#### For SVD

For each singular vector a file named filename.U.XX is created where XX is the number of the singular vector. The same with filename.V.XX. Additionally a singular value files is also saved.

# Algorithms

Here is a table summarizing the properties of the different algorithms in the collaborative filtering library:
 ALGORITHM Method type Comments ALS ALS ALS_COORD/CCD++ ALS Using parallel coordinate descent Sparse-ALS ALS Sparse feature vectors (useful for classifying users/items together) SGD SGD bias-SGD SGD bias-SGD2 SGD Supports logistic loss and MAE SVD Lanczos One Sided SVD For skewed matrices (with one dimension larger than the other) NMF For positive matrices. RBM SGD MCMC method SVD++ SGD LIBFM SGD PMF ALS MCMC method time-SVD++ SGD Supports time BPTF (not implemented yet) MCMC method BaseLine X X WALS ALS Supports weights for each recommendation. TENSOR ALS Tensor factorization. GENSGD Supports arbitrary string format. Can be used for classification. SPARSE_GENSGD libsvm format. CLiMF SGD Minimizes MRR (mean reciprocal rank)

Note: for tensor algorithms, you need to verify you have both the rating and its time. Typically the exact time is binned into time bins (a few tens up to a few hundreds). Having too fine granularity over the time bins slows down computation and does not improve prediction.
Using matrix market format, you need to specify each rating using 4 fields:
[user] [item] [time bin] [rating]

#### Common command line options (for all algorithms)

 --training the training input file --validation the validation input file (optional). Validation is data with known ratings which not used for training. --test the test input file (optional). Test input file is used for computing predictions to a predefined list of user/item pairs. --minval min allowed rating (optional). It is highly recommended to set this value since it improves prediction accuracy. --maxval max allowed rating (optional). It is highly recommended to set this value since it improves prediction accuracy. --max_iter number of iterations to run --quiet run with less traces. (optional, default = 0). --halt_on_rmse_increase (optional, default = 0). Stops execution when validation error goes up. Runs at least the number of iterations specified in the flag. For example --halt_on_rmse_increase=10 will run at least 10 iterations, and then stop if validation RMSE increases. --load_factors_from_file (optional, default = 0). This options allows for two functionalities. Instead of starting with a random state, you can start from any predefined state for the algorithm. This also allows for running a few iterations, saving the results to disk for fault tolerance, and running later FROM THE SAME EXACT state. --D width of the factorized matrix. Default is 20.
--R_output_format              Save output in sparse matrix market format (compatible
with R)

Baseline method The baseline method is a simple and quick way of checking the accuracy of the predictions.
The baseline method support three operation modes:
--algorithm=global_mean // assigns all recommendations to be the global rating mean.
--algorithm=user_mean //assigns recommendations based on each user mean value.
--algorithm=item_mean //assigns recommendations based on each item mean value.

To summarize, the baseline method assigns one of the three possible means as the recommendation results and computes the prediction error. Any other algorithm should give a better result than the baseline method, and thus it can be used a sanity check for the deployed algorithms.
ALS (Alternating least squares) Pros: Simple to use, not many command line arguments
Cons: intermediate accuracy, higher computational overhead

ALS is a simple yet powerful algorithm. In this model the prediction is computed as:
r_ui = p_u * q_i
Where r_ui is a scalar rating of user u to item i, and p_u is the user feature vector of size D, q_i is the item feature vector of size D and the product is a vector product.
The output of ALS is two matrices: filename_U.mm and filename_V.mm The matrix U holds the user feature vectors in each row. (Each vector has exactly D columns). The matrix V holds the feature vectors for each time (Each vector has again exactly D columns). In linear algebra notation the rating matrix R ~ UV

Below are ALS related command line options:

 Basic Confirmation --lambda=XX Set regularization. Regularization helps to prevent overfitting.

### CCD++ (Alternating least squares, parallel coordinate descent) Pros: Simple to use, not many command line arguments, faster than ALSCons: Slower convergence relative to ALSIn CCD++ the prediction is computed as:   r_ui = p_u * q_iWhere r_ui is a scalar rating of user u to item i, and p_u is the user feature vector of size D, q_i is the item feature vector of size D and the product is a vector product. The output of CCD++ are two matrices: filename_U.mm and filename_V.mm The matrix U holds the user feature vectors in each row. (Each vector has exactly D columns). The matrix V holds the feature vectors for each time (Each vector has again exactly D columns). In linear algebra notation the rating matrix R ~ UVBelow are CCD++ related command line options: Basic Confirmation--lambda=XXSet regularization. Regularization helps to prevent overfitting. Stochastic gradient descent (SGD) Pros: fast methodCons: need to tune step size, more iterations are needed relative to ALS.SGD is a simple gradient descent algorithm. Prediction in SGD is done as in ALS:   r_ui = p_u * q_iWhere r_ui is a scalar rating of user u to item i, and p_u is the user feature vector of size D, q_i is the item feature vector of size D and the product is a vector product. The output of SGD is two matrices: filename.U and filename.V. The matrix U holds the user feature vectors in each row. (Each vector has exactly D columns). The matrix V holds the feature vectors for each time (Each vector has again exactly D columns). In linear algebra notation the rating matrix R ~ UV

--lambda -  regularization (optional). Default value 1e-3.
--gamma - gradient step size (optional).Default value 1e-3.
--sgd_step_dec - multiplicative step decrement (optional). Default is 0.9.

### Pros: fast methodCons: need to tune step sizeBias-SGD is a simple gradient descent algorithm, where besides of the feature vector we also compute item and user biases (how much their average rating differs from the global average).Prediction in bias-SGD is done as follows:r_ui = global_mean_rating + b_u + b_i + p_u * q_iWhere global_mean_rating is the global mean rating, b_u is the bias of user u, b_i is the bias of item i and p_u and q_i are feature vectors as in ALS. You can read more about bias-SGD in reference [N]. The output of bias-SGD consists of two matrices: filename.U and filename.V. The matrix U holds the user feature vectors in each row. (Each vector has exactly D columns). The matrix V holds the feature vectors for each time (Each vector has again exactly D columns). Additionally, the output consists of two vectors: bias for each user, bias for each item. Last, the global mean rating is also given as output.

bias-SGD command line arguments:
--biassgd_lambda -regularization (optional).  Default value 1e-3.
--biassgd_gamma -gradient step size (optional). Default value 1e-3.
--biassgd_step_dec - multiplicative gradient step decrement (optional). Default is 0.9.

### Where global_mean_rating is the global mean rating, b_u is the bias of user u, b_i is the bias of item i and p_u and q_i are feature vectors as in ALS. You can read more about bias-SGD in reference [N]. The output of bias-SGD2 consists of two matrices: filename.U and filename.V. The matrix U holds the user feature vectors in each row. (Each vector has exactly D columns). The matrix V holds the feature vectors for each time (Each vector has again exactly D columns). Additionally, the output consists of two vectors: bias for each user, bias for each item. Last, the global mean rating is also given as output.

bias-SGD2 command line arguments:
--biassgd_lambda -regularization (optional).  Default value 1e-3.
--biassgd_gamma -gradient step size (optional). Default value 1e-3.
--biassgd_step_dec - multiplicative gradient step decrement (optional). Default is 0.9.
--loss=square/abs/logistic

#### Pros: more accurate method than SGD once tuned, relatively fast methodCons: a lot of parameters for tuning, subject to numerical errors when parameters are out of scope.Koren SVD++ is an algorithm which is slightly more fancy than bias-SGD and give somewhat better prediction results.

Prediction in Koren’s SVD++ algorithm is computed as follows:

r_ui = global_mean_rating + b_u + b_i + q_i * ( p_u + w_u )
Where r_ui is the scalar rating for user u to item i, global_mean_rating is the global mean rating, b_u is a scalar bias for user u, b_i is a scalar bias for item i, p_u is a feature vectors of length D for user u, q_i is a feature vector of length D for item i, and w_u is an additional feature vector of length D (the weight) for user u. The product is a vector product.

The output of Koren’s SVD++ is 5 output files:
Global mean ratings - include the scalar global mean rating.
user_bias  - includes a vector with bias for each user
movie_bias - includes a vector with bias for each movie
matrix U - includes in each row the feature vector p_u of size D and then the weight vector w_u of size D total width of 2D.
matrix V - includes in each row the item feature vector q_i of width D.

SVD++ command line arguments:
--svdpp_item_bias_step, --svdpp_user_bias_step, --svdpp_user_factor_step, --svdpp_user_factor2_step - gradient step size (optional). Default value 1e-4.
--svdpp_item_bias_reg, --svdpp_user_bias_reg, --svdpp_user_factor_reg, --svdpp_user_factor2_reg - regularization (optional). Default value 1e-4.
--svdpp_step_dec - multiplicative gradient step decrement (optional). Default is 0.9.

### Weighted Alternating Least Squares (WALS)

Pros: allows weighting of ratings (can be thought of as confidence in rating), almost the same computational cost like in ALS.
Cons: worse modeling error relative to ALS

Weighted ALS is a simple extension for ALS where each user/item pair has an additional weight. In this sense, WALS is a tensor algorithm since besides of the rating it also maintains a weight for each rating. Algorithm is described in references [I, J].

Prediction in WALS is computed as follows:
r_ui = w_ui * p_u * q_i

The scalar value r for user u and item i is computed by multiplying the weight of the rating w_ui by the vector product p_u * q_i. Both p and q are feature vectors of size D.

Note: for weighted-ALS, the input file has 4 columns:
[user] [item] [weight] [rating]. See example file in section 5e).

--lambda - regularization

Alternating least squares with sparse factors
Pros: excellent for spectral clustering
Cons: less accurate linear model because of the sparsification step

This algorithm is based on ALS, but an additional sparsifying step is performed on either the user feature vectors, the item feature vectors or both. This algorithm is useful for spectral clustering: first the rating matrix is factorized into a product of one or two sparse matrices, and then clustering can be computed on the feature matrices to detect similar users or items.

The underlying algorithm which is used for sparsifying is CoSaMP. See reference [K1].

Below are sparse-ALS related command line options:

 Basic configuration --user_sparsity=XX A number between 0.5 to 1 which defines how sparse is the resulting user feature factor matrix --movie_sparsity=XX A number between 0.5 to 1 which defines how sparse is the resulting movie feature factor matrix --algorithm=XX There are three run modes:  SPARSE_USR_FACTOR = 1  SPARSE_ITM_FACTOR = 2  SPARSE_BOTH_FACTORS = 3

Prediction in sparse-ALS is computing like in ALS.

bickson@thrust:~/graphchi$./bin/sparse_als.cpp --training=smallnetflix_mm --user_sparsity=0.8 --movie_sparsity=0.8 --algorithm=3 --quiet=1 --max_iter=15 WARNING: sparse_als.cpp(main:202): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [smallnetflix_mm] [user_sparsity] => [0.8] [movie_sparsity] => [0.8] [algorithm] => [3] [quiet] => [1] [max_iter] => [15] 0) Training RMSE: 1.11754 Validation RMSE: 3.82345 1) Training RMSE: 3.75712 Validation RMSE: 3.241 2) Training RMSE: 3.22943 Validation RMSE: 2.03961 3) Training RMSE: 2.10314 Validation RMSE: 2.88369 4) Training RMSE: 2.70826 Validation RMSE: 3.00748 5) Training RMSE: 2.70374 Validation RMSE: 3.16669 6) Training RMSE: 3.03717 Validation RMSE: 3.3131 7) Training RMSE: 3.18988 Validation RMSE: 2.83234 8) Training RMSE: 2.82192 Validation RMSE: 2.68066 9) Training RMSE: 2.29236 Validation RMSE: 1.94994 10) Training RMSE: 1.58655 Validation RMSE: 1.08408 11) Training RMSE: 1.0062 Validation RMSE: 1.22961 12) Training RMSE: 1.05143 Validation RMSE: 1.0448 13) Training RMSE: 0.929382 Validation RMSE: 1.00319 14) Training RMSE: 0.920154 Validation RMSE: 0.996426 ### tensor-ALS Note: for tensor-ALS, the input file has 4 columns: [user] [item] [time] [rating]. See example file in section 5b). --lambda - regularization ### Non-negative matrix factorization (NMF) Non-negative matrix factorization (NMF) is based on Lee and Seung [reference H]. Prediction is computed like in ALS: r_ui = p_u * q_iNamely the scalar prediction r of user u is composed of the vector product of the user feature vector p_u (of size D), with the item feature vector q_i (of size D). The only difference is that both p_u and q_i have all nonnegative values.The output of NMF is two matrices: filename.U and filename.V. The matrix U holds the user feature vectors in each row. (Each vector has exactly D columns). The matrix V holds the feature vectors for each time (Each vector has again exactly D columns). In linear algebra notation the rating matrix R ~ UV, U>=0, V>=0. NMF Has no special command line arguments. ### SVD SVD is implemented using the restarted lanczos algorithm. The input is a sparse matrix market format input file. The output are 3 files: one file containing the singular values, and two dense matrix market format files containing the matrices U and V. Note: for larger models, it is advised to use svd_onesided since it significantly saved memory. Here is an example Matrix Market input file for the matrix A2: <235|0>bickson@bigbro6:~/ygraphlab/graphlabapi/debug/toolkits/parsers$ cat A2
%%MatrixMarket matrix coordinate real general
3 4 12
1 1  0.8147236863931789
1 2  0.9133758561390194
1 3  0.2784982188670484
1 4  0.9648885351992765
2 1  0.9057919370756192
2 2  0.6323592462254095
2 3  0.5468815192049838
2 4  0.1576130816775483
3 1  0.1269868162935061
3 2  0.09754040499940952
3 3  0.9575068354342976
3 4  0.9705927817606157

Here is an for running SVD :

### bickson@thrust:~/graphchi$./bin/svd --training=A2 --nsv=4 --nv=4 --max_iter=4 --quiet=1 WARNING: svd.cpp(main:329): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [A2] [nsv] => [4] [nv] => [4] [max_iter] => [4] [quiet] => [1] Load matrix set status to tol Number of computed signular values 4 Singular value 0 2.16097 Error estimate: 2.35435e-16 Singular value 1 0.97902 Error estimate: 1.06832e-15 Singular value 2 0.554159 Error estimate: 1.56173e-15 Singular value 3 9.2673e-65 Error estimate: 6.51074e-16 Lanczos finished 7.68793 ### Listing the output files: #> ls -lrt -rw-r--r-- 1 bickson bickson 2728 2012-09-20 01:57 graphchi_metrics.txt -rw-r--r-- 1 bickson bickson 2847 2012-09-20 01:57 graphchi_metrics.html -rw-r--r-- 1 bickson bickson 188 2012-09-20 01:57 A2.V.3 -rw-r--r-- 1 bickson bickson 179 2012-09-20 01:57 A2.V.2 -rw-r--r-- 1 bickson bickson 179 2012-09-20 01:57 A2.V.1 -rw-r--r-- 1 bickson bickson 177 2012-09-20 01:57 A2.V.0 -rw-r--r-- 1 bickson bickson 208 2012-09-20 01:57 A2.U.3 -rw-r--r-- 1 bickson bickson 195 2012-09-20 01:57 A2.U.2 -rw-r--r-- 1 bickson bickson 195 2012-09-20 01:57 A2.U.1 -rw-r--r-- 1 bickson bickson 194 2012-09-20 01:57 A2.U.0 -rw-r--r-- 1 bickson bickson 271 2012-09-20 01:57 A2.singular_values ### Verifying solution accuracy in matlab >>A2=mmread('A2'); >> full(A2)' ans = 0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575 0.9649 0.1576 0.9706 Now we read graphchi output using: % read the top 3 singular values: >> sigma=mmread('A2.singular_values'); >> sigma=sigma(1:3); Read the top 3 vectors v: >> v0=mmread('A2.V.0'); >> v1=mmread('A2.V.1'); >> v2=mmread('A2.V.2'); Read the top 3 vectors u: >> u0=mmread('A2.U.0'); >> u1=mmread('A2.U.1'); >> u2=mmread('A2.U.2'); Compute an approximation to A2: >> [u0 u1 u2] * diag(sigma) * [v0 v1 v2]' ans = 0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575 0.9649 0.1576 0.9706 As you can see we got an identical result to A2. where >> [u0 u1 u2] ans = 0.5047 0.5481 0.2737 0.4663 0.4726 -0.2139 0.4414 -0.4878 0.7115 0.5770 -0.4882 -0.6108 >> [v0 v1 v2]' ans = 0.7019 0.5018 0.5055 0.2772 0.4613 -0.8428 -0.6561 0.7317 0.1847 >> diag(sigma) ans = 2.1610 0 0 0 0.9790 0 0 0 0.5542 ### SVD Command line arguments  Basic configuration --training Input file name (in sparse matrix market format) --nv Number of inner steps of each iterations. Typically the number should be greater than the number of singular values you look for. --nsv Number of singular values requested. Should be typically less than --nv --ortho_repeats Number of repeats on the orthogonalization step. Default is 1 (no repeats). Increase this number for higher accuracy but slower execution. Maximal allowed values is 3. --max_iter Number of allowed restarts. The minimum is 2= no restart. --save_vectors=0 Disable saving the factorized matrices U and V to file. On default save_vectors=1. --tol Convergence threshold. For large matrices set this number set this number higher (for example 1e-1, while for small matrices you can set it to 1e-16). As smaller the convergence threshold execution is slower. ### Understanding the error measure Following Slepc, the error measure is computed by a combination of: sqrt( ||Av_i - sigma(i) u_i ||_2^2 + ||A^Tu_i - sigma(i) V_i ||_2^2 ) / sigma(i) Namely, the deviation of the approximation sigma(i) u_i from Av_i , and vice versa. ### Scalability Currently the code was tested with up to 3.5 billion non-zeros on a 24 core machine. Each Lanczos iteration takes about 30 seconds. ### Difference to Mahout Mahout SVD solver is implemented using the same Lanczos algorithm. However, there are several differences 1) In Mahout there are no restarts, so quality of the solution deteriorates very rapidly, after 5-10 iterations the solution is no longer accurate. Running without restarts can be done using our solution with the --max_iter=2 flag. 2) In Mahout there is a single orthonornalization step in each iteration while in our implementation there are two (after computation of u_i and after v_i ). 3) In Mahout there is no error estimation while we provide for each singular value the approximated error. 4) Our solution is typically x100 times faster than Mahout. #### Notes about parameter tuning (In case not enough singular vectors have converged): SVD have a few tunable parameters you need to play with. 1) --tol=XX, this is the tolerance. When not enough singular vectors converge to a desired tolerance you can increase it, for example from 1e-4 to 1e-2, etc. 2) --nv=XX this number should be larger than nsv. Typically you can try 20% more or even larger. 3) --nsv=XX this is the number of the desired singular vectors 4) --max_iter=XX - this is the number of restarts. When the algorithm does not converge you can increase the number of restarts. Restricted Bolzman Machines (RBM) RBM algorithm is detailed in [Hinton's paper]. It is a MCMC method that works on binary data. In other words, the ratings have to be binned into a discrete space. For example, for KDD CUP 2011 rating between 0 to 100 can be binned into 10 bins: 0-10, 10-20 etc. rbm_scaling defines the factor to divide the rating for binning (in the example it is 10). rbm_bins defines how many bins are there in total. In this example we have 11 bins: 0,1,..,10.  Basic configuration --rbm_mult_step_dec=XX Multiplicative step decrement (should be 0.1 to 1, default is 0.9) --rbm_alpha=XX Alpha parameter: gradient descent step size --rbm_beta=XX Beta parameter: regularization --rbm_scaling=XX Optional. Scale the rating by dividing it with the rbm_scaling constant. For example for KDD cup data rating of 0..100 can be scaled to the bins 0,1,2,3,.. 10 by setting the rbm_scaling=10 --rbm_bins=XX Total number of binary bins used. For example in Netflix data where we have 1,2,3,4,5 the number of bins is 6 #### Advanced topic: Understanding RBM output format and predicting values. #### In case you like to use GraphChi output file for computing RBM predictions you should compute the following: You should implement the RBM prediction function which is found here: https://github.com/GraphChi/graphchi-cpp/blob/master/toolkits/collaborative_filtering/rbm.cpp#L129-L154 Assume D is the feature vector length. The default D=20. Basically, each user node has 3 fields: h, h0 and h1, each of them is a vector of size 20. Those vectors are appended to get a single vector of default size 60. The U matrix has row as the number of users (M) x 60. Each movie node has 3 fields: ni (a double), bi is a vector of size rbm_bins (the default is 6). and w is a vector of size rbm_bins * D = 120 in default. In the output file first the bi vector is written (size = 6) and then w, total of 126. The V matrix has rows as the number of items (N) x 126. Note that the prediction involves bi, h, w but does not involve h0, h1, ni. Koren time-SVD++ Pros: more accurate than SVD++ Cons: many parameters to tune, prone to numerical errors. Koren’s time-SVD++ [Korens paper above] takes into account also the temporal aspect of the rating. Prediction in time-SVD++ algorithm is computed as follows: r_uik = global_mean_rating + b_u + b_i + ptemp_u * q_i + x_u * z_k + pu_u * pt_i * q_k The scalar rating r_uik (rating for intersection of user u, item i, and time bin k) equals the above sum. Like in Koren’s SVD++ the rating equals the sum of the global mean rating and biases for user and item. The following are feature vectors. For the user we have ptemp_i , x_u and pu_u. All of length D. For the item we have additional three feature vectors: ptemp_u, x_u and pu_u. For the time bins we have z_k and q_k, two feature vectors of size D.  Basic configuration --lrate=XX Learning rate --beta Beta parameter (bias regularization) --gamma Gamma parameter (feature vector regularization) --lrate_mult_dec Multiplicative step decrement (0.1 to 1, default 0.9) --D=X Feature vector width. Common values are 20 - 150. Special Note: This is a tensor factorization algorithm. Please don’t forget to prepare a 4 column matrix market format file, with [user] [ item ] [ time ] [ rating ] in each row. It is advised to delete intermediate files created by als_tensor, since they have a different format. ## Factorization Machines (FM) GraphChi's libFM algorithm implementation contains a subset of the full libFM functionality with only three predictions: user, item and time. Users are encouraged to check the original libFM library: http://www.libfm.org/ for enhanced implementation. libFM library by Steffen Rendle has a track record performance in KDD CUP and is highly recommended collaborative filtering package. Factorization machines is a SGD type algorithm. It has two differences relative to bias-SGD: 1) It handles time information by adding feature vectors for each time bin 2) It adds additional feature for the last item rated for each user. Those differences are supposed to make it more accurate than bias-SGD. Factorization machines is detailed in reference [P]. There are several variants, here the SGD variant is implemented (and not the ALS). Prediction in LIBFM is computed as follows: r_ui = global_mean_rating + b_u + b_i + b_t + b_li + 0.5*sum(p_u.^2 + q_i.^2 + w_t.^2 + s_li.^2 - (p_u + q_i + w_t + s_li ).^2) Where global_mean_rating is the global mean rating, b_u is the bias of user u, b_i is the bias of item i , b_t is the bias of time t, b_li is the bias of the last item li, and p_u is the feature vector of user u, and q_i is the feature vector of item i, w_t is the feature vector of time t, s_li is the feature vector of last item li. All feature vectors have size of D as in ALS. .^2 is the element by element power operation (as in matlab). The output of LIBFM consists of three matrices: filename.Users, filename.Movies and filename.Times. The matrix Users holds the user feature vectors in each row. (Each vector has exactly D columns). The matrix Movies holds the feature vectors for each item (Each vector has again exactly D columns). The matrix Times holds the feature vectors for each time. Additionally, the output consists of four vectors: bias for each user, bias for each item, bias for each time bin and bias for each last item. Last, the global mean rating is also given as output.  Basic configuration --libfm_rate=XX Gradient descent step size --libfm_regw=XX Gradient descent regularization for biases --libfm_regv=XX Gradient descent regularization for feature vectors --libfm_mult_dec=XX Multiplicative step decrease. Should be between 0.1 to 1. Default is 0.9 --D=X Feature vector width. Common values are 20 - 150. ## PMF Pros: once tuned, better accuracy than ALS, since it involves extra sampling step Cons: sensitive to numerical errors, needs fine tuning, does not work on every dataset, higher computational cost, higher prediction computational cost. PMF and BPTF are two Markov Chain Monte Carlo (MCMC) sampling methods. They are based on ALS, but on each step a sampling from the probability is perform for obtaining the next state. Prediction in PMF/BPTF is like in ALS, but instead of computing one vector product of the current feature vector, the whole chain of products is computed and the average is taken. More formally, the prediction rule of PMF is: r_ui = [ p_u(1) * q_i(1) + p_u(2) * q_i(2) + .. + p_u(l) * q_i(l) ] / l Where l is the length of the chain. Note: typically in MCMC methods, the first XX samples of the chain are thrown away, so p_u and q_i will start from XX and not from 1. The prediction rule of BPTF includes a feature vector for each time bin, denote w: r_uik = [ p_u(1) * q_i(1) * w_k(1) + p_u(2) * q_i(2) * w_k(2) + .. + p_u(l) * q_i(l) * w_k(l) ] / l Where the product is a tensor product, namely \sum_j p_uj * q_ij * w_kj  Basic configuration --pmf_burn_in=XX Throw away the first XX samples in the chain --pmf_additional_output=1 Save as output all the samples in the chain (after the burn in period). Each sample is composed of two feature vectors. Each will be saved on its own file. Example running PMF Here we run 10 iterations of PMF, where the first 5 are discarded (pmf_burn_in) and the rest are used for computing the prediction: bickson@thrust:~/graphchi$ ./toolkits/collaborative_filtering/pmf --training=smallnetflix_mm --quiet=1 --minval=1 --maxval=5 --max_iter=10 --pmf_burn_in=5
WARNING:  common.hpp(print_copyright:104): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any  comments or bug reports to danny.bickson@gmail.com
1.24716) Iteration:   0 Training RMSE:    1.56917  Validation RMSE:     2.4979 ratings_per_sec:          0
2.5872) Iteration:   1 Training RMSE:    2.44993  Validation RMSE:    2.03815 ratings_per_sec: 1.16359e+06
3.95615) Iteration:   2 Training RMSE:     1.7831  Validation RMSE:    1.26519 ratings_per_sec: 1.55628e+06
5.33609) Iteration:   3 Training RMSE:    1.08493  Validation RMSE:    1.05008 ratings_per_sec: 1.76283e+06
6.73702) Iteration:   4 Training RMSE:   0.939768  Validation RMSE:   0.993025 ratings_per_sec: 1.88536e+06
Finished burn-in period. starting to aggregate samples
8.16872) Iteration:   5 Training RMSE:    0.88499  Validation RMSE:   0.978547 ratings_per_sec: 1.95767e+06
9.54684) Iteration:   6 Training RMSE:   0.864345  Validation RMSE:   0.972835 ratings_per_sec: 2.01243e+06
10.9789) Iteration:   7 Training RMSE:   0.837162  Validation RMSE:   0.948436 ratings_per_sec: 2.04756e+06
12.43) Iteration:   8 Training RMSE:   0.823885  Validation RMSE:   0.939388 ratings_per_sec: 2.0749e+06
13.8361) Iteration:   9 Training RMSE:   0.814482  Validation RMSE:    0.93436 ratings_per_sec: 2.10232e+06

As a sanity check, now we run 10 iteration were all 10 are discarded (pmf_burn_in=10):

bickson@thrust:~/graphchi$./toolkits/collaborative_filtering/pmf --training=smallnetflix_mm --quiet=1 --minval=1 --maxval=5 --max_iter=10 --pmf_burn_in=10 WARNING: common.hpp(print_copyright:104): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com 1.18773) Iteration: 0 Training RMSE: 1.55811 Validation RMSE: 2.4997 ratings_per_sec: 0 2.56929) Iteration: 1 Training RMSE: 2.45047 Validation RMSE: 2.07669 ratings_per_sec: 1.16566e+06 3.94601) Iteration: 2 Training RMSE: 1.75645 Validation RMSE: 1.32239 ratings_per_sec: 1.56984e+06 5.27586) Iteration: 3 Training RMSE: 1.11416 Validation RMSE: 1.04864 ratings_per_sec: 1.78811e+06 6.69646) Iteration: 4 Training RMSE: 0.937365 Validation RMSE: 0.994412 ratings_per_sec: 1.89396e+06 8.05631) Iteration: 5 Training RMSE: 0.886551 Validation RMSE: 0.978154 ratings_per_sec: 1.97636e+06 9.44744) Iteration: 6 Training RMSE: 0.861688 Validation RMSE: 0.972389 ratings_per_sec: 2.03489e+06 10.8185) Iteration: 7 Training RMSE: 0.846078 Validation RMSE: 0.972082 ratings_per_sec: 2.07996e+06 12.2176) Iteration: 8 Training RMSE: 0.836964 Validation RMSE: 0.971611 ratings_per_sec: 2.1115e+06 13.6285) Iteration: 9 Training RMSE: 0.829531 Validation RMSE: 0.96975 ratings_per_sec: 2.13407e+06 As you can see, the sampling step improves validation prediction from 0.969 to 0.934. ### GenSGD It is recommended to read first the GenSGD detailed case studies here: http://bickson.blogspot.co.il/2012/12/collaborative-filtering-3rd-generation.html  Mandatory configuration --training Input file --from_pos Column number of the feature which is used as "users" in the matrix factorization case. Column number starts from zero. --to_pos Column number of the feature which is used as "items" in the matrix factorization case. Column number starts from zero. --val_pos Column number of the value (the target variable) we would like to predict. For example the rating in the matrix factorization case. Column number starts from zero. --file_columns Number of features in the input (training) file. (Note that from_pos, to_pos, val_pos should be smaller than the file_column number) Optional configuation --rehash=1 If some or all of the feature fields are strings, you should use rehash=1 to translate them into numeric ids. If all the fields are numbers, use --rehash=0. Default is 0. --D Latent feature vector width. Default is 20. --calc_error=1 When used for classification, calc_error treats the target is binary value and counts how many validation/training instances are wrong. See cuttoff. --cutoff When used for binary classification cutoff is the threshold value were prediction > cutoff is positive and position <= cutoff is negative. Default is 0. --user_file File name of additional user properties (optional). Each line should start with user id and then a list of features. --item file File name of additional item properties (optional). Each line should start with item id and then a list of features. --limit_rating=X For debugging: limit the number of rows in training file to X. SGD tunable parameters --gensgd_rate1 SGD step size for users (from_pos). Default 1e-2. --gensgd_rate2 SGD step size for items (to_pos). Default 1e-2. --gensgd_rate3 SGD step size of rating features in training file. Default 1e-2. --gensgd_rate4 SGD step size of user/item features in additional feature files. Default 1e-2. --gensgd_mult_dec SGD multiplicative step size decrement - default 0.9. --gensgd_regw SGD bias regularization. Default 1e-3. --gensgd_reg0 SGD global mean regularization. Default 1e-1. --gensgd_regv SGD features regularization. Default 1e-3. #### Prediction computation in gensgd: Prediction is computed as follows. rating = global_mean + sum_f (bias_f) + 1/2*(sum_f (pvec_f) - sum_f (pvec_f.^2)) Where f is an index going over all the factors involved, pvec_f is the feature vector of factor f, bias_f is the bias of factor f, and .^2 is elementwise square. See equation (5) in the libFM paper. (Note: that x_i and x_j are all equal 1 in our implementation). #### Output of gensgd The output of gensgd are the following files: 1) a matrix of size f x D, where f is the number of feature vectors used and D is the feature vectors width. Generated filename is training_file_name + "_U.mm". 2) a vector of size f x 1, where f is the number of feature vectors which holds the scalar bias for each feature vector. Generated filename is training_file_name + "_bias_U.mm". 3) the global mean. Generated filename is training_file_name + "_global_mean.mm" 4) Mapping file for each feature. For each feature (each column) there is a map between the feature string name, and the integer id of this feature, in the arrays (1) and (2) above. The mapping files are generated only when using the --rehash=1 option. Generated file names are training_file_name + ".map." + feature_id Sparse_GenSGD http://bickson.blogspot.co.il/2012/12/3rd-generation-collaborative-filtering.html ### CLiMF CLiMF was contributed by Mark Levy(last.fm). The CLiMF algorithm, described in the paper: CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering. Yue Shi, Martha Larson, Alexandros Karatzoglou, Nuria Oliver, Linas Baltrunas, Alan Hanjalic, Sixth ACM Conference on Recommender Systems, RecSys '12. CLiMF is a ranking method which optimizes MRR (mean reciprocal rank) which is an information retrieval measure for top-K recommenders. CLiMF is a variant of latent factor CF which optimises a significantly different objective function to most methods: instead of trying to predict ratings CLiMF aims to maximise MRR of relevant items. The MRR is the reciprocal rank of the first relevant item found when unseen items are sorted by score i.e. the MRR is 1.0 if the item with the highest score is a relevant prediction, 0.5 if the first item is not relevant but the second is, and so on. By optimising MRR rather than RMSE or similar measures CLiMF naturally promotes diversity as well as accuracy in the recommendations generated. CLiMF uses stochastic gradient ascent to maximise a smoothed lower bound for the actual MRR. It assumes binary relevance, as in friendship or follow relationships, but the graphchi implementation lets you specify a relevance threshold for ratings so you can run the algorithm on standard CF datasets and have the ratings automatically interpreted as binary preferences. CLiMF-related command-line options: --binary_relevance_thresh=xx Consider the item liked/relevant if rating is at least this value [default: 0] --halt_on_mrr_decrease Halt if the training set objective (smoothed MRR) decreases [default: false] --num_ratings Consider this many top predicted items when computing actual MRR on validation set [default:10000] Here is an example on running CLiMF on Netflix data: ./toolkits/collaborative_filtering/climf --training=smallnetflix_mm --validation=smallnetflix_mme --binary_relevance_thresh=4 --sgd_gamma=1e-6 --max_iter=6 --quiet=1 --sgd_step_dec=0.9999 --sgd_lambda=1e-6 Training objective:-9.00068e+07 Validation MRR: 0.169322 Training objective:-9.00065e+07 Validation MRR: 0.171909 Training objective:-9.00062e+07 Validation MRR: 0.172372 Training objective:-9.0006e+07 Validation MRR: 0.172503 Training objective:-9.00057e+07 Validation MRR: 0.172544 Training objective:-9.00054e+07 Validation MRR: 0.172549 Prediction is computed in CLiMF as follows reciproal_rank_ij = g( U_i ' * V_j ) where g() is the logistic function, and U_i is the feature vector of user i and V_j is the feature vector of user J. Both feature vectors are of size D. The output of CLiMF are two files training_file_name_U.mm and training_file_name_V.mm. ## Post processing of the output #### Example 1: Load output in Matlab, for computing recommendations for ALS/SGD/NMF a) run ALS ./toolkits/collaborative_filtering/als --training=smallnetflix_mm --validation=smallnetflix_mme --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1 b) download the script mmread.m. c) # matlab >> A=mmread('smallnetflix_mm'); >> U=mmread('smallnetflix_mm_U.mm'); >> V=mmread('smallnetflix_mm_V.mm'); >> whos Name Size Bytes Class Attributes A 95526x3561 52799104 double sparse U 95526X5 3821040 double V 3561X5 142480 double c) compute prediction for user 8 movie 12: >> U(8,:)*V(12,:)' d) compute approximation error >> norm(A-U*V') % may be slow... depending on problem size #### Example 2: Load output in Matlab, for verifying bias-SGD results a) run the command line: ./toolkits/collaborative_filtering/biassgd --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1 b) download the script mmread.m. c) # matlab >> V=mmread('smallnetflix_mm_V.mm'); % read item matrix V >> U=mmread('smallnetflix_mm_U.mm'); % read user matrix U >> m=mmread('smallnetflix_mm_global_mean.mm'); % read global mean >> bV=mmread('smallnetflix_mm_V_bias.mm'); % read user bias >> bU=mmread('smallnetflix_mm_U_bias.mm'); % read item bias >> pairs = load('pairs'); % read user/item pairs >> A=mmread('smallnetflix_mme'); % read rating matrix >> >> rmse = 0; >> for r=1:545177, % run over each rating % compute bias-SGD prediction % using the prediction rule: % prediction = global_mean + bias_user + bias_item + vector_user*vector_item pred = m + bU(pairs(r,1)) + bV(pairs(r,2)) + U(pairs(r,1),:)*V(pairs(r,2),:)'; pred = min( 5, pred ); % truncate prediction [1,5] pred = max( 1, pred ); obs = A( pairs(r,1), pairs(r,2) ); rmse = rmse + (pred - obs).^2; % compute RMSE by (observation- % prediction)^2 end >> >> sqrt( rmse/545177.0 ) % print RMSE ans = 1.1239 % compare the training RMSE to g % graphchi output # Computing top-K recommendations For computing top K recommendations out of the computed linear model, use the rating/rating2 commands. The following algorithms are supported: ALS, sparse-ALS, NMF, SGD, WALS, SVD++, bias-SGD, CLiMF, SVD. For ALS, Sparse-ALS, NMF, SGD,CliMF, SVD use rating application. For SVD++, bias-SGD, RBM use rating2 application. First you need to run one of the above methods (ALS, SGD, NMF etc.) . Next, compute recommended ratings as follows: ./toolkits/collaborative_filtering/rating --training=smallnetflix_mm --num_ratings=5 --quiet=1 --algorithm=als WARNING: common.hpp(print_copyright:128): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [smallnetflix_mm] [num_rating] => [5] [quiet] => [1] Computing recommendations for user 1 at time: 0.827547 Computing recommendations for user 1001 at time: 0.871366 Computing recommendations for user 2001 at time: 0.908017 ... Computing recommendations for user 95001 at time: 2.15397 Rating output files (in matrix market format): smallnetflix_mm.ratings, smallnetflix_mm.ids Distance statistics: min 0 max 42.1831 avg 9.51098 The output of the rating algorithm are two files. The first one is more useful. 1) filename.ids - includes recommended item ids for each user. 2) filename.ratings - includes scalar ratings of the top K items bickson@thrust:~/graphchi/toolkits/collaborative_filtering$ head smallnetflix_mm.ids %%MatrixMarket matrix array real general
%This file contains item ids matching the ratings. In each row i the top K item ids for user i. (First column is user id, next are top recommendations for this user).
95526 6
1 3424 1141 1477 2151 2012
2 2784 1900 516 1835 1098
3 1428 3450 2284 2328 58
4 209 1073 3285 60 1271
5 132 1702 2575 1816 2284
6 2787 1816 3024 2514 985
7 3078 375 168 2514 2460
...

### bickson@thrust:~/graphchi$head smallnetflix_mme.predict %%MatrixMarket matrix coordinate real general %This file contains user/item pair predictions. In each line one prediction. The first column is user id, second column is item id, third column is the computed prediction.95526 3561 545177135 1 3.6310739140 1 3.7827248141 1 3.5731169154 1 3.9835398162 1 3.9378759167 1 3.9865881169 1 3.6489052171 1 4.0544691 ... ### Speeding up execution 0) Verify that your program is compiled using the "-O3" compiler flag. (Should be enabled on default). This gives significant speedup (for example x5). Verify that your program is compiled using EIGEN_NDEBUG compiler flag. (Should be enabled on default). 1) If your system has enough memory, you can preload the problem into memory instead of reading them from disk on each iteration. This is done using the --nshards=1 command. This gives around x2 speedup. 2) If your system has enough memory, you can increase used memory size using the membudget_mb command. Example: ./toolkits/collaborative_filtering/als --training=smallnetflix_mm --validation=smallnetflix_mme --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1 membudget_mb 20000 3) You can tune the number of execution threads using execthreads command. Depends on your machine different number of threads may give better results. The thumb rule is one thread per physical core. Example for setting the number of threads: ./toolkits/collaborative_filtering/als --training=smallnetflix_mm --validation=smallnetflix_mme --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1 execthreads 4 4) You can disable compression by defining the following macro in your program code: #define GRAPHCHI_DISABLE_COMPRESSION and recompiling. This will require increased disk space but will speed up execution. ## K-Fold cross validation It is possible to apply K-fold cross validation to your dataset. This is done by applying the following two flags: --kfold_cross_validation=10, enables k-fold cross validation by setting K=10 and so on. --kfold_cross_validation_index=3, defines that we are working on the 4th fold (out of 10, indices start from zero). Notes: 1) Currently supported algorithms for k-fold cross validation are: als, wals, sparse_als, svdpp, nmf, pmf, sgd, biassgd, biassgd2, rbm, timesvdpp, baseline. 2) Selection is done by rows, so when using K=10, index=3 every 4th row in 10 rows will be excluded from the training set. Example run: ./toolkits/collaborative_filtering/als --training=smallnetflix_mm --kfold_cross_validation=10 --quiet=1 --kfold_cross_validation_index=3 -- WARNING: common.hpp(print_copyright:149): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [smallnetflix_mm] [kfold_cross_validation] => [10] [quiet] => [1] [kfold_cross_validation_index] => [3] [validation] => [smallnetflix_validation] ... 4.91313) Iteration: 0 Training RMSE: 2.03244 Validation RMSE: 1.19777 6.33828) Iteration: 1 Training RMSE: 0.748826 Validation RMSE: 1.15937 7.8193) Iteration: 2 Training RMSE: 0.690095 Validation RMSE: 1.14381 9.25151) Iteration: 3 Training RMSE: 0.665744 Validation RMSE: 1.13516 10.6588) Iteration: 4 Training RMSE: 0.649499 Validation RMSE: 1.13151 12.0984) Iteration: 5 Training RMSE: 0.638833 Validation RMSE: 1.13044 Finished writing 329816 predictions to file: smallnetflix_mm.predict Now run for a different fold: ./toolkits/collaborative_filtering/als --training=smallnetflix_mm --kfold_cross_validation=10 --quiet=1 --kfold_cross_validation_index=4 --validation=smallnetflix_validation WARNING: common.hpp(print_copyright:149): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [smallnetflix_mm] [kfold_cross_validation] => [10] [quiet] => [1] [kfold_cross_validation_index] => [4] [validation] => [smallnetflix_validation] ... 4.7885) Iteration: 0 Training RMSE: 1.97529 Validation RMSE: 1.19191 6.11275) Iteration: 1 Training RMSE: 0.745336 Validation RMSE: 1.15712 7.52895) Iteration: 2 Training RMSE: 0.685904 Validation RMSE: 1.14291 8.91787) Iteration: 3 Training RMSE: 0.661709 Validation RMSE: 1.13662 10.2528) Iteration: 4 Training RMSE: 0.646958 Validation RMSE: 1.13445 11.5109) Iteration: 5 Training RMSE: 0.637327 Validation RMSE: 1.13369 ### Other cost functions Most of the algorithms compute RMSE by default. We also support MAP@K metric. You can run it using the --calc_ap=XX flag. The --ap_number=XX flag defines K. Note: the assumption is that the dataset has binary values (0/1). Common errors and their meaning File not found error: bickson@thrust:~/graphchi$ ./bin/example_apps/matrix_factorization/als_vertices_inmem file smallnetflix_mm
INFO:     sharder.hpp(start_preprocessing:164): Started preprocessing: smallnetflix_mm --> smallnetflix_mm.4B.bin.tmp
ERROR:    als.hpp(convert_matrixmarket_for_ALS:153): Could not open file: smallnetflix_mm, error: No such file or directory

Solution:
Input file is not found, repeat step 5 and verify the file is in the right folder

Environment variable error:
bickson@thrust:~/graphchi/bin/example_apps/matrix_factorization$./als_vertices_inmem ERROR: Could not read configuration file: conf/graphchi.local.cnf Please define environment variable GRAPHCHI_ROOT or run the program from that directory. Solution: export GRAPHCHI_ROOT=/path/to/graphchi/folder/ Error: FATAL: io.hpp(convert_matrixmarket:169): Failed to read global mean from filesmallnetflix_mm.gm Solution: remove all temporary files created by the preprocessor, verify you have write permissions to your working folder and try again. ### Adding fault tolerance For adding fault tolerance, use the command line flag --load_factors_from_file=1 when continuing any previous run. The following algos are supported: ALS, WALS, sparse_ALS, tensor_ALS, NMF, SGD, bias-SGD and SVD++. Here is an example for bias-SGD. 1) Run a few rounds of the algo: bickson@thrust:~/graphchi$ ./toolkits/collaborative_filtering/biassgd --training=smallnetflix_mm --max_iter=3 --quiet=1
WARNING:  biassgd.cpp(main:210): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any  comments or bug reports to danny.bickson@gmail.com
[training] => [smallnetflix_mm]
[max_iter] => [3]
[quiet] => [1]
1.5052) Iteration:   0 Training RMSE:    1.40926  Validation RMSE:     1.1636
3.30333) Iteration:   1 Training RMSE:    1.07647  Validation RMSE:    1.09299
5.28362) Iteration:   2 Training RMSE:    1.02413  Validation RMSE:    1.05944

=== REPORT FOR biassgd-inmemory-factors() ===
..

2) Now continue from the same run, after the 3 iterations:

bickson@thrust:~/graphchi$./toolkits/collaborative_filtering/biassgd --training=smallnetflix_mm --max_iter=3 --quiet=1 --load_factors_from_file=1 WARNING: biassgd.cpp(main:210): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [smallnetflix_mm] [max_iter] => [3] [quiet] => [1] [load_factors_from_file] => [1] 2.63894) Iteration: 0 Training RMSE: 0.996053 Validation RMSE: 1.03869 4.07894) Iteration: 1 Training RMSE: 0.977975 Validation RMSE: 1.02427 5.6297) Iteration: 2 Training RMSE: 0.965245 Validation RMSE: 1.01355 .. As you can see the second runs, starts from the state of the first run. ## Item based similarity methods Item based similarity methods documentation is found here. ## Case studies ACM KDD CUP 2012 - in this post I show how to utilize multiple feature information for predicting advertisement clicked by users, using KDD CUP 2012 data (we won 4th place out of 192 groups). Airline on time dataset + Hearst machine learning challenge - in this post I show how to predict airplane flight time using airline on time dataset, and how to predict user reaction to email campaign using the hearst machine learning challenge. ACM KDD CUP 2010 - in this post I explain how to predict student learning abilities using ACM KDD CUP 2010 dataset. Million songs dataset - in this post I explain how to obtain the winning solution in the millions songs dataset contest, using a computation of item based similarities and their derived recommendaitons. ### Acknowledgements/ Hall of Fame Deployment of GraphChi CF toolkit was not possible without the great help of data scientist around the world who contributed their efforts for improving my code! Here is a preliminary list, I hope I did not forget anyone... #### 125 comments: 1. Hi Danny, The link to mmread.m is broken, how ever I did a little snooping and found it over here: http://select.cs.cmu.edu/code/graphlab/mmread.m 1. Link fixed. Thanks!! 2. Hello again! :) Thanks for your previous replies to my answers I'm running examples with netflix data in .mm and .mme files. So I'd like to know - what method results in 7) relate to? 1) I've run all methods ALS, SGD, bias-SGD, SVD++ and NMF It executed successfully. But results in head of U_matrix doesn't match any results I've got. 2) Also I've tried to run WALS as it described in 6e) but it failed like this: FATAL: io.hpp(convert_matrixmarket:226): Col index larger than the matrix col size 3902 > 3561 in line; 701 terminate called after throwing an instance of 'char const*' Aborted (core dumped) -- Are results in 7) probably old ? And how can I start WALS? (BTW it's not essential for me to run WALS, i'm interesting mostly in working with SVD - but may be my notes will help you to fix some bugs :) ) 1. Hi Aleksandr, You feedback is highly valuable. In fact, I have just added you an acknowledgement (send me a link to your website if you have one and I will link to it). Regarding 7) - it explains how to read the output of the multiple methods. I am not sure what you mean about results of the U_matrix? please elaborate. Regarding WALS, please note that WALS takes 4 inputs in a row: [user] [item] [time/weight] [rating]. You need to download the files: http://www.select.cs.cmu.edu/code/graphlab/datasets/time_smallnetflix and http://www.select.cs.cmu.edu/code/graphlab/datasets/time_smallnetflixe Let me know if you have any additional questions! 2. I meant smallnetflix_mm_U.mm - when talked about U_matrix. So you made everything clear about it for me. About WALS: I have smallnetflix_mm smallnetflix_mme and time_smallnetflix time_smallnetflixe files in my /graphchi And I'm trying just to execute 6e) ./toolkits/collaborative_filtering/als --training=time_smallnetflix --validation=time_smallnetflixe --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1 So I haven't understand details of inputs yet - I just try to follow instructions in 6e). And it throws message that something goes wrong with sizes of matrices. Am I doing smth wrong? 3. Now I got your source of confusion... 6e) was supposed to be wals and not als, I have no fixed documentation. Please try again. 4. That time everything passed well, thank you! 3. taras@ubuntu:~/graphchi$ bash install.sh
--2012-11-24 01:11:31-- http://bitbucket.org/eigen/eigen/get/3.1.1.tar.bz2
Resolving bitbucket.org... 207.223.240.182, 207.223.240.181
Connecting to bitbucket.org|207.223.240.182|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://bitbucket.org/eigen/eigen/get/3.1.1.tar.bz2 [following]
--2012-11-24 01:11:31-- https://bitbucket.org/eigen/eigen/get/3.1.1.tar.bz2
Connecting to bitbucket.org|207.223.240.182|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1049447 (1,0M) [application/x-bzip-compressed-tar]
Saving to: 3.1.1.tar.bz2.2'

100%[==================================================================================================>] 1 049 447 634K/s in 1,6s

2012-11-24 01:11:34 (634 KB/s) - 3.1.1.tar.bz2.2' saved [1049447/1049447]

mv: cannot move eigen-eigen-43d9075b23ef/Eigen' to ./src/Eigen': Directory not empty
g++ -g -O3 -I/usr/local/include/ -I../../src/ -I. -fopenmp -Wall -Wno-strict-aliasing als.cpp -o als
In file included from common.hpp:32,
from als.cpp:56:
util.hpp: In constructor ‘in_file::in_file(std::string)’:
util.hpp:24: error: ‘LOG_FATAL’ was not declared in this scope
util.hpp:24: error: ‘logstream’ was not declared in this scope
In file included from ../../src/graphchi_basic_includes.hpp:51,
from common.hpp:33,
from als.cpp:56:
../../src/metrics/reps/file_reporter.hpp: In member function ‘virtual void graphchi::file_reporter::do_report(std::string, std::string, std::map, std::allocator >, graphchi::metrics_entry, std::less, std::allocator > >, std::allocator, std::allocator >, graphchi::metrics_entry> > >&)’:
../../src/metrics/reps/file_reporter.hpp:74: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 5 has type ‘size_t’
../../src/metrics/reps/file_reporter.hpp:82: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 5 has type ‘size_t’
In file included from common.hpp:37,
from als.cpp:56:
../../example_apps/matrix_factorization/matrixmarket/mmio.c: In function ‘int mm_write_mtx_crd_size(FILE*, uint, uint, size_t)’:
../../example_apps/matrix_factorization/matrixmarket/mmio.c:185: warning: format ‘%ld’ expects type ‘long int’, but argument 5 has type ‘size_t’
../../example_apps/matrix_factorization/matrixmarket/mmio.c: In function ‘int mm_read_mtx_crd_size(FILE*, uint*, uint*, size_t*)’:
../../example_apps/matrix_factorization/matrixmarket/mmio.c:207: warning: format ‘%ld’ expects type ‘long int*’, but argument 5 has type ‘size_t*’

..................

...............

/../src/engine/graphchi_engine.hpp: In member function ‘void graphchi::graphchi_engine::write_delta_log() [with VertexDataType = vertex_data, EdgeDataType = float, svertex_t = graphchi::graphchi_vertex]’:
../../src/engine/graphchi_engine.hpp:727: instantiated from ‘void graphchi::graphchi_engine::run(graphchi::GraphChiProgram&, int) [with VertexDataType = vertex_data, EdgeDataType = float, svertex_t = graphchi::graphchi_vertex]’
als.cpp:228: instantiated from here
../../src/engine/graphchi_engine.hpp:451: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘size_t’
../../src/engine/graphchi_engine.hpp:451: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 5 has type ‘size_t’
make: *** [als] Error 1

1. Thanks Taras for your feedback! I have fixed the compilation error. Retake from mercurial using "hg pull; hg update" and recompile using "make clean; make cf".
Let me know if this works!

2. Thanks a lot! It works good now.

4. ~/graphchi$bash install.sh --2012-12-05 16:50:09-- http://bitbucket.org/eigen/eigen/get/3.1.1.tar.bz2 Resolving bitbucket.org (bitbucket.org)... 207.223.240.182, 207.223.240.181 Connecting to bitbucket.org (bitbucket.org)|207.223.240.182|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://bitbucket.org/eigen/eigen/get/3.1.1.tar.bz2 [following] --2012-12-05 16:50:29-- https://bitbucket.org/eigen/eigen/get/3.1.1.tar.bz2 Connecting to bitbucket.org (bitbucket.org)|207.223.240.182|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1049447 (1,0M) [application/x-bzip-compressed-tar] Saving to: 3.1.1.tar.bz2' 100%[======================================>] 1 049 447 781K/s in 1,3s 2012-12-05 16:50:31 (781 KB/s) - 3.1.1.tar.bz2' saved [1049447/1049447] g++ -g -I/usr/local/include/ -I../../src/ -I. -fopenmp -Wall -Wno-strict-aliasing als.cpp -o als In file included from als.cpp:91:0: rmse.hpp: In function ‘void test_predictions_N(float (*)(const vertex_data**, int, float, double&), int, int*, bool*, int, bool)’: rmse.hpp:241:47: error: ‘minarray’ was not declared in this scope rmse.hpp: In function ‘void validation_rmse_N(float (*)(const vertex_data**, int, float, double&), graphchi::graphchi_context&, int, int, int*, float*, bool*, int, bool)’: rmse.hpp:486:42: error: ‘struct vertex_data’ has no member named ‘last_item’ make: *** [als] Error 1 1. Sorry about that - I committed some experimental code. Please retake again from mercurial using (hg pull; hg update) and then recompile using (make clean; make). Best, 5. I am experiencing the same compilation errors that Taras reported earlier. However, updating has failed to resolve the issue. Are there any any additional resources which may be necessary (I'm using a virtual machine running a new install of Ubuntu 12.10). 1. Very strange.. I just compiled a few minutes ago and everything is smooth.. Are you checking out from mercurial? Please checkout using hg pull; hg update and then recompile using make clean; make cf. If you still have an error please send me the output.. 2. OOPS... Just found another error. Fixed it. Please retry. 3. The update you just posted seems to have fixed everything. A completely unrelated question, is it possible to chain the outputs of the algorithms in order to further refine the accuracy? 4. It is not possible to compute blending or some ensemble method to join together several solutions for higher accuracy. This is a feature we may add in the future. It is possible, to load factors from previous run of the same algorithms from disk and continue from the previous run results in order to refine them. This is done using the --load_factors_from_file=1 flag. See example in the section "adding fault tolerance". 5. I was hoping to use ensemble methods. I look forward to the capability. 6. I am expecting that "filename.ids - includes recommended item ids for each user." will contain k recommendations defined by --num_ratings. For example when --num_ratings = 3, then for each customer recommend 3 items. If this is the case then I am experiancing 10 recommendation everytime. Is that a bug? 1. It should be --num_ratings (and not num_rating). please try again! 2. Thanks, it is working now :) 7. 1. Note: for weighted-ALS, the input file has 4 columns: [user] [item] [weight] [rating]. See example file in section 5b). > I think it is 5e rather than 5b. 2. For weighted-ALS use the rating4 command > There is no rating4 command /toolkits/collaborative_filtering/ 1. Thanks for the update. I have fixed the documentation. 8. Hi Danny, I getting the following error while I am executing the recommendatation based on WALS. FATAL: chifilenames.hpp(get_shard_edata_filesize:115): Could not load /time_smallnetflix.edata.e8B.0_3.size. Preprocessing forgotten? terminate called after throwing an instance of 'char const*' Aborted As mentioned, I have followed the following steps step1: ./wals --training=/time_smallnetflix --validation=/time_smallnetflixe --lambda=0.065 --minval=1 --maxval=5 --max_iter=10 --quiet=1 step2: ./rating --training=/home/kamesh/input/reco/time_smallnetflix --tokens_per_row=4 --num_ratings=5 --quiet=1 While step1 was always sucessed, but was failing when I am trying to execute the step2. Please help me, where am I doing mistake? 1. Thanks for your bug report - I have now fixed this. Please retake from mercurial and let me know if this works for you. (using "hg pull; hg update ; make clean ; make cf") Best, 2. Hi Danny, Thanks, It is working now 9. Hi Danny, I am working on installing GraphChi on centos vm 6.3 and encountering following error. als.cpp:214: instantiated from here ../../src/util/ioutil.hpp:157: error: ‘deflateInit’ was not declared in this scope ../../src/util/ioutil.hpp:174: error: ‘deflate’ was not declared in this scope ../../src/util/ioutil.hpp:178: error: ‘deflateEnd’ was not declared in this scope ../../src/util/ioutil.hpp:188: error: ‘deflateEnd’ was not declared in this scope make[1]: *** [als] Error 1 make[1]: Leaving directory `/home/romit/graphchi/toolkits/collaborative_filtering' 1. 1) Please try to check out from mercurial - this error should be fixed now. (I assume you downloaded the tgz source file. 2) If it does not help, please send me the full compilation command line and the full output so I could look at it. But checking from mercurial should fix it. 10. Hi Danny, I have two questions. For weighted-ALS, the input file has 4 columns i.e, [user] [item] [weight] [rating] where [weight] is the frequency of click through on items? if that is true, suppose an item was never bought but it was viewed many times hence it will have no rating. in that case the data file will be like [user] [item] [weight] 0? regards, Burhan 1. There may be more than one correct way to map between your problem into a matrix factorization problem. weight may be the number of clicks or frequency of clicks - you should try both and see which works better. Regarding zeros - it is not recommended to use zero rating. I suggest trying 1 for viewed and 2 for purchased. 2. Danny I'm dealing with a one class problem too and i'm not sure i clearly understood your answer. From what i understand of "[Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One-Class Collaborative Filtering.", they add 0 in the matrix giving them a weight considering a specific measure. In the case of a product i would say that the inverse of click would be a good weight for these zero (meaning that the more a person click on a product the less the 0 is likely to be true). Without weight, i would put 2, 1 and 0 to the products bought, seen and nothing. I don't undesrtand why it's not recommended to use zero rating in this specific case (i think i saw you use -1/1 in a classification probleme in one of your example)? Would you use "WALS + 1 for viewed and 2 for purchased" ? Regards. 3. Hi Alex, You are of course right, they use zero in their paper. What confused me is that for the zero case you should specify --minval=0 and --maxval=1 namely allowed predictions should be truncated between [0,1]. (And not --minval=1 and --maxval=1 as appeared in the question). The problem with zeros, is that you can not differentiate between zero which is a missing value, to a known zero rating. That is why in many cases we use -1 as negative rating and 1 as positive rating. Let me know if this is clearer.. 11. Danny, "(And not --minval=1 and --maxval=1 as appeared in the question)." => do you talk about Burhan's question or Venkata siva's one ? "The problem with zeros, is that you can not differentiate between zero which is a missing value, to a known zero rating." => that part is still not clear, maybe there's something i've missed about the graph representation of data. If i add implicit ratings with 0 value to my one class data, i'll get a matrix filled with 1 (positive ratings), 0 (negative ratings) and missing values, in my mind i would not get 0 corresponding to missing values. Do you mean that graphchi will add edge with 0 value coresponding to missing value in addition to the implicit ratings ? To be clearer, in a one class problem, if i use this command : ./toolkits/collaborative_filtering/biassgd2 --training=xxx --implicitratingtype=1 --implicitratingvalue=-1 --implicitratingpercentage=0.00001 --minval=-1 --maxval=1 will i get a different result (i mean rank of product not ratings) than this one : ./toolkits/collaborative_filtering/biassgd2 --training=xxx --implicitratingtype=1 --implicitratingvalue=0 --implicitratingpercentage=0.00001 --minval=0 --maxval=1 In the graphlab user group you've answered to Venkata siva on the same subject : "I will add a sanity check that verifies you do not use implicit rating value of 0.", so i supposed there's a real problem of using 0 value implicit ratings ??! Thanks in advance for your answer. Regards. 1. Hi Alex, So many questions - I am starting to get confused.. :-) Some algorithm can support zero ratings, and some others can not. For example, when you solve a sparse linear system, there is no distinction between a zero coefficient or no coefficient. In ALS it is possible to have zero ratings, and the algorithm tries to minimize the dot product between the matching factors and the zero rating. The same with SGD - it is possible to have zero rating. My answer to Venkata about implicit rating value of zero is wrong - I will fix it - since zero is support in some of the algos. 12. Hi, I am trying run WALS, but following links which you mentioned are not working. http://select.cs.cmu.edu/code/graphlab/time_smallnetflix 1. This is the right link: http://select.cs.cmu.edu/code/graphlab/datasets/time_smallnetflix 2. Thanks Danny. It's working now. I have few queries, Can you please suggest me on these? 1) At present, Now I tuned lamda such that RMSE ~ 2. But how can I verify, whether reco's are correct are not? Is there any way to check correctness(like confusion matrix). 2) And also my requirement is to generate reco's based location. So can you please suggest me, which algorithm will fit for this? 3. There are many ways to evaluate the quality of recommendations. Take a look here for a detailed list: http://bickson.blogspot.co.il/2012/10/the-10-recommender-system-metrics-you.html If you have the location of the user as one of the features, I suggest trying out gensgd: http://bickson.blogspot.co.il/2012/12/collaborative-filtering-3rd-generation_14.html Best, 13. Hi Danny, Great work! I have a question about the --implicitratingtype option. Pan et al.'s paper suggest 3 implicit rating types: uniform random, user-oriented, and item-oriented. Am I right in assuming that only the first is implemented already? Also, Pan et al.'s paper uses weighting to indicate how credible the training data is. It seems like this is not implemented in ALS itself, but after reading the papers it seems that WALS (by Hu et al.) is a generalization of this weighting, so I can just use WALS instead? And one more thing: in the WALS paper, prediction is done by simply computing p_u * q_i (the latent user and item factors). Still, here it is stated that we should use r_ui = w_ui * p_u * q_i. Why is this? Is this a result of this specific implementation? Many thanks! Joaquin 1. Thanks Joaquin! You are right - the uniform random is currently implemented. You can use WALS if you want to have a confidence level/or weight for each rating. Prediction is simply p_u * q_i. But when RMSE is computed than the square error is weighted by w_ui. Best, 2. Thanks for the quick reply, Danny! 14. Hi Danny, Can you please provide some instructions about how to build this on windows. I have tried the Java version but it doesn't include this great package (is there a plan to port it to Java version?) I used Eclipse and MinGW to build on windows but got lots of errors . A brief description of the build process for windows (environment ,libraries etc) can save a lots of debugging time and make this great package available on windows . If I want to install linux on my PC (dual boot) what version you recommend? Sample erors from my build: Description Resource Path Location Type 'pread' was not declared in this scope ioutil.hpp /graphchi/src/util line 44 C/C++ Problem 'pwrite' was not declared in this scope ioutil.hpp /graphchi/src/util line 89 C/C++ Problem 'random' was not declared in this scope stripedio.hpp /graphchi/src/io line 383 C/C++ Problem 'random' was not declared in this scope stripedio.hpp /graphchi/src/io line 705 C/C++ Problem 'S_IROTH' was not declared in this scope binary_adjacency_list.hpp /graphchi/src/preprocessing/formats line 194 C/C++ Problem 'S_IROTH' was not declared in this scope conversions.hpp /graphchi/src/preprocessing line 692 C/C++ Problem 'S_IROTH' was not declared in this scope graphchi_engine.hpp /graphchi/src/engine line 988 C/C++ Problem 'S_IROTH' was not declared in this scope ioutil.hpp /graphchi/src/util line 107 C/C++ Problem 'S_IROTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 256 C/C++ Problem 'S_IROTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 524 C/C++ Problem 'S_IROTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 665 C/C++ Problem 'S_IROTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 795 C/C++ Problem 'S_IROTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 846 C/C++ Problem 'S_IWOTH' was not declared in this scope binary_adjacency_list.hpp /graphchi/src/preprocessing/formats line 194 C/C++ Problem 'S_IWOTH' was not declared in this scope conversions.hpp /graphchi/src/preprocessing line 692 C/C++ Problem 'S_IWOTH' was not declared in this scope graphchi_engine.hpp /graphchi/src/engine line 988 C/C++ Problem 'S_IWOTH' was not declared in this scope ioutil.hpp /graphchi/src/util line 107 C/C++ Problem 'S_IWOTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 256 C/C++ Problem 'S_IWOTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 524 C/C++ Problem 'S_IWOTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 665 C/C++ Problem 'S_IWOTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 795 C/C++ Problem 'S_IWOTH' was not declared in this scope sharder.hpp /graphchi/src/preprocessing line 846 C/C++ Problem make: *** [example_apps/connectedcomponents] Error 1 graphchi C/C++ Problem there are no arguments to 'pread' that depend on a template parameter, so a declaration of 'pread' must be available [-fpermissive] ioutil.hpp /graphchi/src/util line 44 C/C++ Problem there are no arguments to 'pread' that depend on a template parameter, so a declaration of 'pread' must be available [-fpermissive] ioutil.hpp /graphchi/src/util line 62 C/C++ Problem there are no arguments to 'pwrite' that depend on a template parameter, so a declaration of 'pwrite' must be available [-fpermissive] ioutil.hpp /graphchi/src/util line 89 C/C++ Problem there are no arguments to 'random' that depend on a template parameter, so a declaration of 'random' must be available [-fpermissive] graph_objects.hpp /graphchi/src/api line 292 C/C++ Problem too many arguments to function 'int mkdir(const char*)' sharder.hpp /graphchi/src/preprocessing line 654 C/C++ Problem 1. I definitely did not target WIndows as a potential platform for my code :-) I suggest installing https://www.virtualbox.org/ with the latest ubuntu image and you are ready to run on windows.. 15. Hello Danny, I am testing these collaborative filtering algorithms on my company data and the resulting predictions are pretty good. However, the computed recommendations that I get after running a particular algorithm say, SGD on my static dataset are dynamic and different than the previous execution. Can you tell me whether this is to be expected? Thanks Manu 1. Hi Manu, I am not sure I got your question. How do you compute the predictions? Is this using the "rating" application, or do you read the matrices U and V and compute the required dot products? If you read U and V using your own program from file, please be careful, since we save the matrices by row order and not by column order. 2. Hi Danny, I am performing the following steps: 1) I created the training file with user id, item id and rating on my dataset. 2) I ran SGD on the training file. 3) I then ran the command for computing predictions. ./toolkits/collaborative_filtering/rating --training=hit_mm --num_ratings=10 --quiet=1 Now, the file "hit_mm".ids that is being created has the 10 item recommendations corresponding to each user. When I repeat the steps 2 and 3 on the same training file(no changes are made in it), the "filename.ids" created this time has different item ids corresponding to each customer. Thus, the item recommendations for the customers generated are different each time even though I have not changed the training file data. I wanted to get the top 10 recommendations for each user but each time the items recommended are different for the user even though SGD algorithm followed by the computing recommendation command is being run on the same dataset. Thanks, Manu 3. This can only be the case if there are several items which gets exactly the same score - in that case you can get a different ordering of them each time. Can you check the ratings file to verify if this is the case? If you suspect some bug, please prepare a small training file were this error happens and send me the file along with the exact command line arguments you are using so I can look at it. And please send it to our user mailing list: graphlab-kdd@groups.google.com 4. Hi Danny, The recommendations are different even when I run the smallnetflix_mm example as explained above. I run the following steps: 1) ./toolkits/collaborative_filtering/baseline --training=smallnetflix_mm --validation=smallnetflix_mm --minval=1 --maxval=5 --quiet=1 --algorithm=user_mean 2) ./toolkits/collaborative_filtering/sgd --training=smallnetflix_mm --validation=smallnetflix_mme --sgd_lambda=1e-4 --sgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1 3) ./toolkits/collaborative_filtering/rating --training=smallnetflix_mm --num_ratings=5 --quiet=1 The smallnetflix_mm file is having different recommendations generated each time when i repeat the steps. The scalar ratings for the items in the smallnetflix.ratings is different for all items. It might be that i am missing some steps. Here, I am listing the item ids generated for first 3 customers for the smallnetflix file using abobe commands. Iteration 1: 95526 6 1 2385 1871 2153 512 2024 2 444 3285 512 3141 3557 3 3285 444 1872 1075 1871 Iteration 2: 95526 6 1 3298 3290 1704 3271 717 2 3290 8 3271 3215 931 3 3298 535 1996 646 2053 Thanks, Manu 5. Hi Manu, Thanks for our note. I now get your question. SGD starts at a random states and computes a gradient descent starting from this state. Thus each run starts from a different state. Furthermore, running 5 iterations does not get the algorithm into a local minima (you can learn about it that RMSE still goes down and does not converge), thus it makes sense that each different run results in a different recommendation. I assume that once you run the algorithm for enough iterations, hopefully the algorithm converge to some significant local minima, and in that case you will see some repeating items recommended that are more "dominent". From the other hand, if you will run the command rating twice from the same state, you will get the exact same recommendation. You can try it out. If you use the --load_factors_from_file=1 flag, you can force SGD to start from a specific initial state, and in this case, most chances you will get similar result (up to some randomness induced by parallelization of the computation). 6. Hi Danny, Thank You for the explanation. I tried it after increasing the iteration for the "smallnetflix" example. I am getting some repeating items for it. In the case of my company dataset, I am not getting repeating items after increasing the interations. I would try to force SGD to start from a initial state. Thanks, Manu 7. Hi Danny, I executed my dataset 4 differnt times with ALS algorithm having 100 iterations each. The data that is being returned now contains some commen items in each execution. However, the common item being recommended is not good recommendation. It is as I think because the ratings are only a handful and the matrix is very sparse. Should I try using WALS in highly sparse matrix cases with weightage based on whether item was viewed, put in cart or purchased. Thanks, Manu 8. It that case, I recommended not to merge the different features into a single scalar (viewed, put in cart, purchased) but to use gensgd: http://bickson.blogspot.co.il/2012/12/collaborative-filtering-3rd-generation_14.html with all the different features for getting a more accurate prediction. 16. Hi Danny, I'm getting the following error while running WALS: ERROR: chifilenames.hpp(load_vertex_intervals:364): Could not load intervals-file: /Users/joa/Documents/ProjectX/View_graphchie.2.intervals The strange thing is that if I run it a second time, it works fine. It seems that you are expecting a file that isn't (yet?) available, but the second time around it does find that file (generated by the previous run?). This is on a mac, the command used is wals with options --minval=0 --maxval=1 --max_iter=2 --quiet=1 --implicitratingtype=1 --D=20 --lambda=0.02 Thanks, Joaquin 1. It seems that you local binary cache file we use for GraphChi got garbled. Please remove all intermediate files using the command "rm -fR filename.*" where filename is your training input file, and the same for your validation input file. 2. Right, that solved the issue. Thanks Danny! 17. Hi Danny, Does the package raphchi_src_v0.2.1.tar.gz supposed to include zlib header files? If yes why zlib.h is missing when I try to install? I dowloaded the file graphchi_src_v0.2.1.tar.gz when trying to install on ubuntu (using bash install.sh) I get the folllowing error: ../../src/util/ioutil.hpp:35:18: fatal error: zlib.h: No such file or directory thanks for your help Al 1. Now GraphChi has zlib dependency you should install it in your ubuntu using: sudo apt-get install zlib1g zlib1g-dev 18. I did that and it worked. Thank you very much.Al 19. Hi,Danny I tried to apply SVD++ in GraphChi to my own dataset, but got training RMSE larger than baseline with user mean algorithm. And most predicted results are 1. So I tested it with smallnetflix_mm dataset. And I got this: ./toolkits/collaborative_filtering/svdpp --training=smallnetflix_mm --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=40 --quiet=1 [training] => [smallnetflix_mm] [biassgd_lambda] => [1e-4] [biassgd_gamma] => [1e-4] [minval] => [1] [maxval] => [5] [max_iter] => [40] [quiet] => [1] 5.76351) Iteration: 0 Training RMSE: 1.85505 11.39) Iteration: 1 Training RMSE: 1.88477 17.0008) Iteration: 2 Training RMSE: 1.88162 22.615) Iteration: 3 Training RMSE: 1.88363 28.2434) Iteration: 4 Training RMSE: 1.8801 33.8473) Iteration: 5 Training RMSE: 1.8806 39.4943) Iteration: 6 Training RMSE: 1.87717 45.1257) Iteration: 7 Training RMSE: 1.87193 50.735) Iteration: 8 Training RMSE: 1.87877 56.3461) Iteration: 9 Training RMSE: 1.86998 61.9751) Iteration: 10 Training RMSE: 1.87939 67.6264) Iteration: 11 Training RMSE: 1.87873 73.2331) Iteration: 12 Training RMSE: 1.88215 78.8441) Iteration: 13 Training RMSE: 1.88339 84.4545) Iteration: 14 Training RMSE: 1.89403 90.0751) Iteration: 15 Training RMSE: 1.89794 95.6995) Iteration: 16 Training RMSE: 1.90168 101.32) Iteration: 17 Training RMSE: 1.91386 106.934) Iteration: 18 Training RMSE: 1.92345 112.56) Iteration: 19 Training RMSE: 1.93437 118.195) Iteration: 20 Training RMSE: 1.94366 123.813) Iteration: 21 Training RMSE: 1.94866 129.443) Iteration: 22 Training RMSE: 1.9549 135.067) Iteration: 23 Training RMSE: 1.96029 140.674) Iteration: 24 Training RMSE: 1.96464 146.3) Iteration: 25 Training RMSE: 1.96941 151.912) Iteration: 26 Training RMSE: 1.97312 157.527) Iteration: 27 Training RMSE: 1.97665 163.143) Iteration: 28 Training RMSE: 1.97978 168.76) Iteration: 29 Training RMSE: 1.98246 174.379) Iteration: 30 Training RMSE: 1.98439 179.983) Iteration: 31 Training RMSE: 1.9872 185.603) Iteration: 32 Training RMSE: 1.98911 191.245) Iteration: 33 Training RMSE: 1.99011 196.871) Iteration: 34 Training RMSE: 1.99117 202.475) Iteration: 35 Training RMSE: 1.99125 208.098) Iteration: 36 Training RMSE: 1.99237 213.719) Iteration: 37 Training RMSE: 1.99255 219.332) Iteration: 38 Training RMSE: 1.99185 224.951) Iteration: 39 Training RMSE: 1.99234 Do you have any idea what is the problem? 1. Something is wrong.. I run the same command and I get: bickson@thrust:~/graphchi$ ./toolkits/collaborative_filtering/svdpp --training=smallnetflix_mm --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=40 --quiet=1
WARNING: common.hpp(print_copyright:139): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com
[training] => [smallnetflix_mm]
[biassgd_lambda] => [1e-4]
[biassgd_gamma] => [1e-4]
[minval] => [1]
[maxval] => [5]
[max_iter] => [40]
[quiet] => [1]
1.41124) Iteration: 0 Training RMSE: 1.18431
2.6689) Iteration: 1 Training RMSE: 1.05354
3.90823) Iteration: 2 Training RMSE: 1.00054
5.05744) Iteration: 3 Training RMSE: 0.975234
6.35913) Iteration: 4 Training RMSE: 0.959926
7.57246) Iteration: 5 Training RMSE: 0.949503

Can you please try:
1) verify you got the latest version from mercurial and recompile.
2) delete cache files using the command "rm -fR smallnetflix_mm.*"
Let me know if this works!

2. Thanks a lot! I deleted all the cache files and finally got correct outputs.

20. I run baseline of different means, results are showed as below:

User means:
0.246803) Iteration: 0 Training RMSE: 0.9728 Validation RMSE: 0.9728

Item means:
0.240668) Iteration: 0 Training RMSE: 1.0005 Validation RMSE: 1.0005

Global means:
0.22687) Iteration: 0 Training RMSE: 1.0781 Validation RMSE: 1.0781

ALS:
I run this command
./toolkits/collaborative_filtering/als --training=smallnetflix_mm --validation=smallnetflix_mme --lambda=0.065 --minval=1 --maxval=5 --max_iter=6 --quiet=1

Get this result:
1.0407) Iteration: 0 Training RMSE: 1.57422 Validation RMSE: 1.26173
2.15975) Iteration: 1 Training RMSE: 0.757194 Validation RMSE: 1.21038
3.27556) Iteration: 2 Training RMSE: 0.697027 Validation RMSE: 1.18852
4.39368) Iteration: 3 Training RMSE: 0.672537 Validation RMSE: 1.17485
5.51733) Iteration: 4 Training RMSE: 0.656315 Validation RMSE: 1.16683
6.67326) Iteration: 5 Training RMSE: 0.645484 Validation RMSE: 1.16204
7.79483) Iteration: 6 Training RMSE: 0.637849 Validation RMSE: 1.15905
8.92256) Iteration: 7 Training RMSE: 0.632197 Validation RMSE: 1.15763
10.0482) Iteration: 8 Training RMSE: 0.627851 Validation RMSE: 1.1572
11.1715) Iteration: 9 Training RMSE: 0.624425 Validation RMSE: 1.15746

I also tried different parameters of max_iter and D, but the results(Validation RMSE) I got from als were worse than baseline.
I don't know how this happened? Is it happened because I have not correctly use this tool?

What's the result of als algorithm on small netflix dataset? Can you show me?

1. Hi,
First of all, when using the baseline method, it seems you are using the training dataset as the validation dataset and thus you get the same training and validation RMSE.
When I run it on the validation I am getting (for global means)
$./toolkits/collaborative_filtering/baseline --training=smallnetflix_mm --validation=smallnetflix_mme --quiet=1 ... 1.72134) Iteration: 0 Training RMSE: 1.0781 Validation RMSE: 1.09003 Second, you are right that ALS overfits the training and have very good error of 0.624 while the validation error is even worse than the baseline. I suggest trying SGD instead : bickson@thrust:~/graphchi$ ./toolkits/collaborative_filtering/sgd --training=smallnetflix_mm --validation=smallnetflix_mme --quiet=1 --minval=1 --sgd_lambda=1e-2 --maxval=5 --sgd_gamma=1e-2 --sgd_step_dec=0.9999
WARNING: common.hpp(print_copyright:139): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com
[training] => [smallnetflix_mm]
[validation] => [smallnetflix_mme]
[quiet] => [1]
[minval] => [1]
[sgd_lambda] => [1e-2]
[maxval] => [5]
[sgd_gamma] => [1e-2]
[sgd_step_dec] => [0.9999]

...
4.31135) Iteration: 0 Training RMSE: 1.01389 Validation RMSE: 1.03374
8.08941) Iteration: 1 Training RMSE: 0.94112 Validation RMSE: 1.00307
11.5534) Iteration: 2 Training RMSE: 0.926045 Validation RMSE: 0.993292
14.9594) Iteration: 3 Training RMSE: 0.917201 Validation RMSE: 0.987297

As you can see both training and validation error are better than the baseline.

2. I have read the codes of als algorithm.

/**
* Vertex update function - computes the least square step
*/
void update(graphchi_vertex &vertex, graphchi_context &gcontext) {
vertex_data & vdata = latent_factors_inmem[vertex.id()];
mat XtX = mat::Zero(D, D);
vec Xty = vec::Zero(D);

bool compute_rmse = (vertex.num_outedges() > 0);
// Compute XtX and Xty (NOTE: unweighted)
for(int e=0; e < vertex.num_edges(); e++) {
float observation = vertex.edge(e)->get_data();
vertex_data & nbr_latent = latent_factors_inmem[vertex.edge(e)->vertex_id()];
Xty += nbr_latent.pvec * observation;
XtX.triangularView() += nbr_latent.pvec * nbr_latent.pvec.transpose();
if (compute_rmse) {
double prediction;
rmse_vec[omp_get_thread_num()] += als_predict(vdata, nbr_latent, observation, prediction);
}
}

for(int i=0; i < D; i++) XtX(i,i) += (lambda); // * vertex.num_edges();

// Solve the least squares problem with eigen using Cholesky decomposition
}

I modified the codes of lambda regulations：
for(int i=0; i < D; i++) XtX(i,i) +=(lambda)*vertex.num_edges();

It works much better.I don't know why you comment "// * vertex.num_edges();" According to the paper "Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize." It indeed needs it.

3. Hi Weiwen,
Thanks for your comment, I have added an additional flag called --regnormal, where on default (1) it adds regularization as described by the paper by Zhou Wilkinson and Scheriber, and when set to zero, it does add lambda as regularization. This applies to the algorithms: als, sparse_als, als_tensor, pmf and wals.

21. Thanks a lot, It works now! I should choose different parameters to control overfitting on different datasets.

22. Hi Danny,

Thanks for your powerful toolkit!

There are two questions I hope you could help me.

The first one is about zero rating. My rating matrix is a complete one, but quite sparse with a lot of zero ratings. In this case, how can I differentiate the zero rating from non-rated ones? The zero ratings contribute for RMSE calculation and objective function minimization.

The second one is about the recommendation for a new user. In your examples, the users in the validation and test files are the same with the ones in the training files. How can I predict the rating of a new user when I have its probe rating and the trained U, V matrices?

Thanks a lot.

Best,
Rico

1. Most of the algorithms like SGD, ALS etc. can take into computation ratings which are zero, in that case they try to force the product of the matching user and item feature vectors to be close to zero. When a rating is not specified this user-item pair is not taken into account int he computation. So uknown ratings should be simply ignored and not entered into the input file.

An additional approach, is in WALS algorithm, where there is a weight for each rating. In that case you can put a confidence level in the rating correctness, where rating with larger weight will influence the computation more.

The SGD, ALS algorithms are not incremental, so if a new user with known ratings is added you will need to run the algorithm again. It is possible however to store the results of the previous computation, and load it using using --load_factors_from_file, that we the computation with the new user will start from the best computed previous position. Note that the new user ID will have to be in the matrix range of the previous computation (so you need to assume a bound on the number of new users).

In case there is no information at all about the new user, not much can be done, in that case you can recommend popular items for example.

23. Hi Danny,

Thank you very much for your help.

So assume my training rating matrix is

>> A=[1 2 ; 3 0]
A =
1 2
3 0
Namely, user 1 has rated item 1 with the rating 1, item 2 with the rating 2, and user 2 has rated item 1 with the rating of 3, item 2 with the rating 0.

And for testing, I have a new user 3, who has rated item 1 with the rating 5, but his rating for item 2 is unknown.

Then my training input file should be :

%%MatrixMarket matrix coordinate real general
% Generated 1-April-2013
3 2 5
1 1 1
2 1 3
3 1 5
1 2 2
2 2 0

If assume we know the groundtruth of the new user 3, whose rating for item 2 is 4. So Then my validation input file should be :

%%MatrixMarket matrix coordinate real general
% Generated 1-April-2013
3 2 2
3 1 5
3 2 4

Are these right forms?

Thank you very much!

Best,
Rico

1. Hi Rico,
The training and validation formats are fine. My only concern is that user 3 have rated item 1 in the training, and item 2 in validation, so there will be no more additional items to rate for her since there are only 2 items. Besides of that the syntax is fine.

Best,

24. Thank Danny!

The reason is that I want to get the validation RMSE as my evaluation metric to compare different algorithms. That is, I view the rating in the validation file as the groundtruth, and compare the groundtruth and the prediction by the algorithm.

Best,

25. Hi, Danny!

Does the ALS-WR method based on Y. Koren et al. paper "Collaborative Filtering for Implicit Feedback Datasets" differ from WALS, presented here? I'm asking, because GraphChi needs two parameters, rating AND weight, while Mahout impl of ALS-WR requires only confidence ratings (r_ij according to the article)? Probably I missed something, but I want to compare the results with implicit feedback and no numeric ratings (user 'watched' a TV show 5 times)

1. Hi,
It is the same method, but our implementation is more general since it allows for non binary ratings. Namely rating can be reals and not just 0/1.

26. Hi Dr. Bickson,

I am trying to run genSGD on some of my data with 43 features. The program gives me the following error: FATAL: gensgd.cpp(main:1136): file_columns exceeds the allowed storage limit - please increase FEATURE_WIDTH and recompile.

My command is: ./toolkits/collaborative_filtering/gensgd --training=GroupAGraphChi.csv --val_pos=1 --rehash=1 --max_iter=100 --gensgd_mult_dec=0.999999 --minval=0 --quiet=1 --calc_error=1 --file_columns=44

My question is, how may I increase Feature_Width? I searched this page and could not find any except for the "--D" option. I tried it as follow:
./toolkits/collaborative_filtering/gensgd --training=GroupAGraphChi.csv --val_pos=1 --rehash=1 --max_iter=100 --gensgd_mult_dec=0.999999 --minval=0 --quiet=1 --calc_error=1 --file_columns=44 --D 50
But it still does not work. (Moreover, Latent variable width is definitely a different creature from feature width).

Any suggestion? Thank you so much!
-Wendy

1. Hi Wendy,
This is rather simple. You need to change line 43 of gensgd.cpp to have your new data width + 1. You can view the code here: https://code.google.com/p/graphchi/source/browse/toolkits/collaborative_filtering/gensgd.cpp
After this change you must recompile (using make clean; make cf)

Best,

2. Got it! Many thanks!
Also, is there a rather conservative way to get the numbers for the Matrix Market header?
This is what I found from a quick search:
"""If format was specified as coordinate, then the size line has the form:

m n nonzeros

where
m is the number of rows in the matrix;
n is the number of columns in the matrix;
nonzeros is the number of nonzero entries in the matrix (for general symmetry), ."""

But it seems that your construction of the MM header does not follow this guideline.

Thanks again!

3. I am not sure I got your question: here is the explanation http://bickson.blogspot.com/2012/02/matrix-market-format.html

You statement is right.

27. Hi Danny,

I've been trying to get the climf algorithm to run but I'm having issues. I am running it on a matri market format with 629233 users and 2039744 items with 35950755 observations. Here is the MM file specification:

%%MatrixMarket matrix coordinate real general
%===============================================================================
629233 2039744 35950755

This is what I get from climf with extra configuration enabled. The machine has 40000m of free space to run this on. I've also tried running this with all of the default parameters.

worio@kona:~/collaborative_filtering/graphchi$./toolkits/collaborative_filtering/climf --training=/home/worio/zite_likes_dataset/train.tsv --max_iter=6 --nshards=1 membudget_mb 40000 execthreads 8 --dim=50 [training] => [/home/worio/zite_likes_dataset/train.tsv] [max_iter] => [6] [nshards] => [1] [dim] => [50] INFO: chifilenames.hpp(find_shards:258): Detected number of shards: 1 INFO: chifilenames.hpp(find_shards:259): To specify a different number of shards, use command-line parameter 'nshards' INFO: io.hpp(convert_matrixmarket:478): File /home/worio/zite_likes_dataset/train.tsv was already preprocessed, won't do it again. INFO: io.hpp(read_global_mean:109): Opened matrix size: 629233 x 2039744 edges: 35950755 Global mean is: 0.905809 time bins: 0 Now creating shards. [feature_width] => [20] [users] => [629233] [movies] => [2039744] [training_ratings] => [35950755] [number_of_threads] => [8] [membudget_Mb] => [40000] DEBUG: stripedio.hpp(stripedio:201): Start io-manager with 2 threads. INFO: graphchi_engine.hpp(graphchi_engine:150): Initializing graphchi_engine. This engine expects 4-byte edge data. INFO: chifilenames.hpp(load_vertex_intervals:378): shard: 0 - 2668975 INFO: graphchi_engine.hpp(run:673): GraphChi starting INFO: graphchi_engine.hpp(run:674): Licensed under the Apache License 2.0 INFO: graphchi_engine.hpp(run:675): Copyright Aapo Kyrola et al., Carnegie Mellon University (2012) DEBUG: slidingshard.hpp(sliding_shard:193): Total edge data size: 143803020, /home/worio/zite_likes_dataset/train.tsv.edata.e4B.0_1sizeof(ET): 4 INFO: graphchi_engine.hpp(print_config:125): Engine configuration: INFO: graphchi_engine.hpp(print_config:126): exec_threads = 8 INFO: graphchi_engine.hpp(print_config:127): load_threads = 4 INFO: graphchi_engine.hpp(print_config:128): membudget_mb = 40000 INFO: graphchi_engine.hpp(print_config:129): blocksize = 4194304 INFO: graphchi_engine.hpp(print_config:130): scheduler = 0 INFO: graphchi_engine.hpp(run:706): Start iteration: 0 INFO: graphchi_engine.hpp(run:760): 0.000497s: Starting: 0 -- 2668975 INFO: graphchi_engine.hpp(run:773): Iteration 0/5, subinterval: 0 - 2668975 DEBUG: memoryshard.hpp(load_edata:249): Compressed/full size: 0.0412882 number of blocks: 35 INFO: graphchi_engine.hpp(run:799): Start updates INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:450): In-memory mode: Iteration 0 starts. DEBUG: climf.cpp(before_iteration:59): before_iteration: resetting MRR DEBUG: mrr_engine.hpp(reset_mrr:118): Detected number of threads: 8 Training objective:-2.35614e+10 DEBUG: climf.cpp(after_iteration:77): after_iteration: running validation engine INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:450): In-memory mode: Iteration 1 starts. DEBUG: climf.cpp(before_iteration:59): before_iteration: resetting MRR DEBUG: mrr_engine.hpp(reset_mrr:118): Detected number of threads: 8 FATAL: climf.hpp(dg:108): overflow in dg() terminate called after throwing an instance of 'char const*' Aborted (core dumped) I haven't figured out how to get past this point and I was wondering if you could give me a few pointers on how to get this to run correctly. Thanks! 28. Need to decrease step sizes (sgd_gamma) and regularization (sgd_lambda) Best, 29. Hi Danny, I am comparing accuracy of different algorithms and i need a way to get total RMSE of an algorithm, is there any command in graphchi to compute total RMSE? what about getting accuracy results in MAE? plus i am getting segmentation fault error on MOVIELENS_MM and MOVIELENS_MME when trying to use SGD. any suggestions? 30. what i meant was test error RMSE. 1. Hi Sam, You should prepare an additional validation file and pass it into graphchi using the command line --validation=filename. The validation RMSE will be printed in each iteration. Please send us the full error you are getting for SGD. Verify you are taking the latest version from github, and send us the full command line you used. Best, 2. Thanks for your quick response, i'm new to this field so excuse my low level questions and spending your valuable time is highly appreciated in advance, i got familiar to MF and Graphchi Framework by advice of Tauqi chen: 1.in ALS the max Iter is 15 no matter what you choose for maxiter, 2.regarding RMSE can we refer to Validation as the results of accuracy of our experiment? 3. what kind of visualizations(rather than RMSE charts) we could use to show results of our experiments? what tools we could use? Excel charts? 4.In ALS by adjusting lamda from 1e-4 to something much bigger like 10 or 20 i get better results. is it ok? can we infer we increased accuracy by regularizing weights of parameter to a small range? 5. is there any way to tune regularization parameters for each q,p and their biases in SGD? 6. is there any reference to compare time and space complexity of Algos implemented in Graphchi/Graphlab? 7. In SGD i get best results on factor Width D=13 on Movielens 100k, we can infer that lower width is harder to distinguish reliable factors and for higher dimensions there are noise in U and V matrixes? 8.this is my problem that also happens with some other algos: sam@sam:~/graphchi$ ./toolkits/collaborative_filtering/baseline --training=smallnetflix_mm --validation=smallnetflix_mm --minval=1 --maxval=5 --quiet=1 --algorithm=user_mean membudget_mmb 20000
WARNING: common.hpp(print_copyright:183): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com
[training] => [smallnetflix_mm]
[validation] => [smallnetflix_mm]
[minval] => [1]
[maxval] => [5]
[quiet] => [1]
[algorithm] => [user_mean]
[feature_width] => [20]
[users] => [95526]
[movies] => [3561]
[training_ratings] => [3298163]
[membudget_Mb] => [800]
Segmentation fault (core dumped)

Best,
Sam

3. Hi Sam!
1) are you using the latest version of graphchi from github? max_iter is a variable and should not be always 15.
2) You can use both training and validation RMSE as measures of the success of your CF method
3) This is a good question - there is no standard way. I suggest registering to our beta at: http://beta.graphlab.com to learn more about visualization and applicability of graphlab
4) It depends on the matrix and the values inside it. Probably you have big values.
5) You need to do it manually. No automated method yet.
6) There are many related papers in this domain. See the the first part of this blog opst.
7) You should try different widths and you will get different results for each dataset.
8) THere is still not enough information to debug this. Please send also your OS type. Are yo working on a virtual box under windows? Note that validation and training can not use the same file name. Verify you get the latest from github. You can compile in debug using "make clean; make cfd"
and then run:
gdb ./toolkits/collaborative_filtering/baseline
run --training=smallnetflix_mm --validation=smallnetflix_mm --minval=1 --maxval=5 --quiet=1 --algorithm=user_mean membudget_mmb 20000
==> send me the full output of the failure, including the output of the command "where" when it fails.

best

4. Hi again, I have the same situation over SVDPP, i use Ubuntu and it's installed on a cori3 Intel with 2GB Ram although i have separately installed windows:
sam@sam:~/graphchi$./toolkits/collaborative_filtering/svdpp --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1 WARNING: common.hpp(print_copyright:183): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [smallnetflix_mm] [validation] => [smallnetflix_mme] [biassgd_lambda] => [1e-4] [biassgd_gamma] => [1e-4] [minval] => [1] [maxval] => [5] [max_iter] => [6] [quiet] => [1] Segmentation fault (core dumped) 5. Please do the following: compile in debug using "make clean; make cfd" and then run: gdb ./toolkits/collaborative_filtering/svdpp run --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1 ==> send me the full output of the failure, including the output of the command "where" when it fails. 6. Dear Danny This is what i get, please consider that i dont have this problem with ALS and SGD but with some other commands like basline and bias-SGD and SVDPP..: sam@sam:~/graphchi$ gdb ./toolkits/collaborative_filtering/svdpp
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /home/sam/graphchi/toolkits/collaborative_filtering/svdpp...done.
(gdb) run --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1
Starting program: /home/sam/graphchi/toolkits/collaborative_filtering/svdpp --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1
WARNING: common.hpp(print_copyright:183): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com
[training] => [smallnetflix_mm]
[validation] => [smallnetflix_mme]
[biassgd_lambda] => [1e-4]
[biassgd_gamma] => [1e-4]
[minval] => [1]
[maxval] => [5]
[max_iter] => [6]
[quiet] => [1]
[New Thread 0xb7ca3b40 (LWP 2301)]
[New Thread 0xb74a2b40 (LWP 2302)]
[New Thread 0xb6a5cb40 (LWP 2303)]
[New Thread 0xb625bb40 (LWP 2304)]
[New Thread 0xb5a5ab40 (LWP 2305)]
[New Thread 0xb4cffb40 (LWP 2306)]
[New Thread 0xb44feb40 (LWP 2307)]
[New Thread 0xafa94b40 (LWP 2308)]
[Thread 0xafa94b40 (LWP 2308) exited]
[New Thread 0xae0bab40 (LWP 2309)]
[Thread 0xae0bab40 (LWP 2309) exited]

Program received signal SIGSEGV, Segmentation fault.
0xb7f609bc in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
(gdb) where
#0 0xb7f609bc in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#1 0xb7f60a4e in std::basic_string, std::allocator >::~basic_string() () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#2 0x08053bdf in validation_rmse (
prediction_func=0x8054a2a , gcontext=..., tokens_per_row=3, avgprd=0x0,
pmf_burn_in=0) at rmse.hpp:206
#3 0x0805f433 in SVDPPVerticesInMemProgram::after_iteration (this=0xbffff118,
iteration=0, gcontext=...) at svdpp.cpp:154
#4 0x08072618 in graphchi::graphchi_engine >::exec_updates_inmemory_mode (this=0xbffff134,
userprogram=..., vertices=...) at ../../src/engine/graphchi_engine.hpp:481
#5 0x08067e84 in graphchi::graphchi_engine >::run (this=0xbffff134, userprogram=...,
_niters=6) at ../../src/engine/graphchi_engine.hpp:807
#6 0x080559ce in main (argc=9, argv=0xbffff334) at svdpp.cpp:302
(gdb)

i used MOVIELENS 10M from your provided datasets and tried SGD and ALS and tried to tuned both Algos parameters (Assuming that Lamda is independet from gamma and matrix width) so SGD converges at Iter 96 with Validation RMSE of 0.898646 with Lamda 0.1,Gamma 0.01 and With 13 and ALS validation RMSE of 0.8589 with Lamda 10 and width 14 at itter 14.
what is the problem? i thought that SGD should give better results.
as i said i considered parameters independent for example i kept Lamda and gamma constant and checked different widths and for lmda and gamma the same, kept one constant and found best lamda and gamma.

8. Hi Sam,
SGD should not give better results vs. ALS - all depends on dataset properties so it is always advised to try both.
Regarding the seg fault. I can't reproduce this error - especially rmse.hpp:206 does not exist in my code. are you using the latest version from github?

31. i have a question that has been in my mind for quiet some time:
GraphLab and Graphchi get the data and convert it to a graph structure. while this is the main idea for graph-based social network analysis as they treat users or items or both of them as vertices and their relations as edges (in RS Handbook 2 approaches are mentioned to overcome Neighborhood-based issues) but in Matrix factorization approaches we don't need to define any graphs.why we need to model data as graph for computing MF models?

1. Hi Sam,
Viewing a problem as a graph is only a way of thinking about the problem and it has sometimes some benefits. There is a correspondence between graph and sparse matrix, so both ways of presentations are interchangeable.

32. Dear Danny
I am thankful for your wonderful support. Could you please advise me if SGD implementation in Graphchi includes the confidence Koren mentions in his paper?
in the paper you mentioned for implementation of SGD, Koren mentions a section"Input with Varying Confidence levels" and accounts for Cui as Confidence.
my guess is that this implemetation does not have confidence in it and i should use W-ALS.
Also i have to impute the confidence separately and there is no option that Graphchi compute it based on number of occurance, right? any advise sir?

33. I think there's a typo in the section "6) View the output", it says
"The files store the matrices U and V in sparse matrix market format."
but it should be
"dense matrix market format"

34. Thanks! Typo fixed.

35. Will the 'rating --model=ALS' compute the top-N for CLiMF as well? I feel like the calculation should be the same as for the other matrix factorisation algorithms, i.e., you're still just trying to find for each row U the columns in V with the largest dot product (since sigmoid(x) is a strictly increasing function of x, the dot products correspond to the sigmoid of the predicted ranking, and we are trying to maximise this quantity).

1. Thats a great suggestion! I have just added climf support for the rating utility. Please pull from git, recompile and let me know if it works for you. (Note that climf computes MRR estimation and not rating between 1-5).

36. Is it possible to save the model produced by these algorithms, and then apply it to new data?

1. Yes. You can run once, and then when new data comes you can use the command line arguments --load_factors_from_file=1 which will load the saved model. You will also want to use the --test=filename to point to the new data you want to predict on.

37. Hi Dr. Bickson,

I tried it as follow for trying to run PMF.

./toolkits/collaborative_filtering/pmf --training=smallnetflix_mm --quiet=1 --minval=1 --max_val=5 --max_iter=10 --pmf_burn_in=5

But, I got the following error after 2 iterations.

assertion "!std:isnan" failed: file "pmf.cpp", line 120, function float pmf_predict
aborted

1. can I know the solution?

- Gates

2. Several words have been missed out in error message.

assertion "!std:isnan(err)" failed: file "pmf.cpp", line 120, function: float pmf_predict(const vertex_data&, const vertex_data&, float, double&, void*)
aborted

3. It seems you got into numerical errors. Try to use the --minval=XX and --maxval=XX command line arguments to limit the range of predicted values. If you like to send me a small dataset where this error happens I will be happy to take a look.

I used the netflix dataset "smallnetflix_mm" that you linked http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm).

And I already used the --minval=1 and --maxval=5 command line arguments.

I haven't the vaguest idea what to do.

- Gates

5. Are you working on MAC OS? For me on ubuntu it works perfectly:
> ./toolkits/collaborative_filtering/pmf --training=smallnetflix_mm --quiet=1 --max_iter=10 --pmf_burn_in=5
WARNING: common.hpp(print_copyright:195): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com
[training] => [smallnetflix_mm]
[quiet] => [1]
[max_iter] => [10]
[pmf_burn_in] => [5]

=== REPORT FOR sharder() ===
[Timings]
edata_flush: 0.22242s (count: 26, min: 0.001595s, max: 0.009056, avg: 0.00855462s)
execute_sharding: 0.690054 s
finish_shard.sort: 0.233932 s
preprocessing: 1.60072 s
shard_final: 0.52435 s
[Other]
app: sharder
3.56735) Iteration: 0 Training RMSE: 2.06394
4.19939) Iteration: 1 Training RMSE: 5.78686
4.82064) Iteration: 2 Training RMSE: 2.47062
5.46099) Iteration: 3 Training RMSE: 0.967806
6.0911) Iteration: 4 Training RMSE: 0.754852
Finished burn-in period. starting to aggregate samples
6.71075) Iteration: 5 Training RMSE: 0.710299
7.34047) Iteration: 6 Training RMSE: 0.692969
7.96921) Iteration: 7 Training RMSE: 0.68672
8.5696) Iteration: 8 Training RMSE: 0.681699
9.20056) Iteration: 9 Training RMSE: 0.677543

6. Thank you very much.
It works well on ubuntu.
I have one more question to get top k recommendations with PMF.
Doesn't the rating / ratings command support yet PMF?

7. Not yet. I advise giving a list of user/item pairs using the --test=filename command, ratings will computed for the list.

38. Hi Danny.
Thank you so much for your support.
But I have one question.
Could you please advise me how to make its speed fast?
You said that "for speedup, verify that your program is compiled using the "-O3" or EIGEN_NDEBUG compiler flag.”
I want to get more detailed information of it for executing your codes(ALS/wALS/SVD/RBM/PMF).
It would be very thankful if you respond it.
Best regards, Yuna.

1. I suggest verifying the macro
#define GRAPHCHI_DISABLE_COMPRESSION
is defined. It should be defined at the top of the .cpp file before all the includes (for example als.cpp). Then you need to make clean and make cf.
This will give a speedup of x2.

39. Hi Danny.
I first ran
./toolkits/collaborative_filtering/svdpp --training=smallnetflix_mm --validation=smallnetflix_mme --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=6 --quiet=1
and then I ran
./toolkits/collaborative_filtering/rating --training=smallnetflix_mm --num_ratings=5 --quiet=1 --algorithm=als
the output said
FATAL: io.hpp(load_matrix_market_matrix:849): Wrong matrix size detected, command line argument should be --D=20 instead of : 40
terminate called after throwing an instance of 'char const*'
Aborted
Can you tell me why? thanks!

1. The algorithm in step 2 should match the algorithm in step 1. Namely it should be --algorithm=svdpp and not --algorithm=als as you wrote.

2. thanks for your help, everything goes well now. I plan to write prediction code using rbm algorithm, can you give give some materials about graphchi, for example, the documentation,etc.

3. Hi,
Per your request, I have added support for RBM top K computation. You should use the rating2 application with --algorithm=rbm. See documentation above for details.

4. Hi Danny,
Thank you very much for your support. I have run the prediction of the rbm algorithm, but it comes out that almost all of the user, the algorithm have recommend the item 1801 and 2964. I don't think it is a satisfying result.

5. Yes, I have noticed the same problem. It may be related to the algorithm, you can try with different parameters or running more iterations.

6. Thank you for your advice, I will try it.

40. Hi Danny.

I am trying to run sgd,svdpp,timesvdpp on movielens data but I can't achieve performance mentioned in "Matrix Factorization Techniques for Recommender Systems". I have tried a lot to initial parameter. For SGD I am doing ./toolkits/collaborative_filtering/sgd --training=userMovielens --kfold_cross_validation=10 --kfold_cross_validation_index=3 --sgd_lambda=0.001 --sgd_gamma=0.03 --minval=1 --maxval=5 --max_iter=600 --quiet=1 --sgd_step_dec=0.9 --D=65. Can you please point out what am I doing wrong.

41. Hi Danny,

I have a beginners question, can GrapChi be used for text classification and regression? I would like to use GrapChi instead of Mahout to predict values and also classify text in categories.

Thank you!

Regards,
Robert

1. Hi Robert,
I recommend taking a look at GraphLab create: http://graphlab.com/products/create/index.html who will soon have regression and classification capabilities.

42. Dear Danny,
Is there a way to run neighborhood recommender method in graphchi?
thanks,
xw

1. Hi,
In graphchi we have item based methods. In GraphLab Create we are working on k-NN methods. Once it is ready we will announce on our website.

43. This comment has been removed by the author.

44. Dear Danny,
About rbm, what is the parameter D? how to give a value to D? what does the rbm_mult_step_dec mean? Thanks!

1. D is always the latent feature vector width (as in all methods).
multiplicative step decrement is how much you decrease the SGD step size. The default is 0.9, namely you multiply by 0.9 the step size after each iteration.

45. Thanks. In order to low RMSE for the test sets, I am tuning rbm_alpha, rbm_beta, D (not sure if it is necessary to tune D). Is there any other parameters I need to tune? Thanks.