Monday, March 12, 2012

GraphLab svd v. 2 tutorial

This blog post is outdated. The new SVD instructions have moved here:
http://docs.graphlab.org/collaborative_filtering.html

18 comments:

  1. Danny, only the shared memory implementation of Lanczos is available so far? Am I right or not?

    ReplyDelete
    Replies
    1. Hi!
      In version 1, we have around 17 algorithms implemented: http://select.cs.cmu.edu/code/graphlab/pmf.html

      In version 2, only lanczos is implemented.

      In version 2.1, we have SGD, ALS and bias-sgd. I plan to soon more lanczos to version 2.1 and thus deprecate version 2.

      Best,

      Delete
    2. OK, but is it possible to run Lancsoz on a multi-node cluster rather than on a single machine?

      Delete
    3. Yes, stay tuned for lanczos for version 2.1.
      By the way the current lanczos was tested on matrices with up to 3.5 billion non zeros on a single multicore machine (with 200GB RAM). So based on the size of your problem it may fit into a single multicore node.

      Delete
    4. Do you plan to include any distributed SVD++ recommender in 2.1 or just SGD and ALS? I thought of timeSVD++ to be the most precise one. If not a secrete, why did you start with SGD and ALS?

      Delete
    5. SGD and ALS are relatively easier to implement..

      Delete
    6. Danny, what if the matrix is dense? I guess, your Lanczos implementation can sustain this and hopefully will work correctly, but inefficiently because the matrix is stored in a graph format and the optimized BLAS matrix-vector multiplication procedure is not applicable. Am I right?

      Another case is implicitly defined matrix. We don't know its values and structure but can calculate the matrix-vector product and the product of its transpose. That should be enough for Lanczos. Not sure your implementation permits this.

      Delete
    7. Hi,
      The current implementation is designed for a fixed sparse matrix.
      Dense matrix will perform slower and I guess other tools will work on it better.
      Matrix operator is not supported since we assume the matrix is known.
      There are many other SVD implementations, the goal here was to show we can use graphlab to get quite an efficient implementation under the above assumption.

      Delete
  2. hi can graphlab can be used for SVD of sparse nonsymmetric matrices in parallel?

    ReplyDelete
  3. CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files:
    JAVA_JVM_LIBRARY (ADVANCED)

    while running ./configure
    what must i do ?

    ReplyDelete
    Replies
    1. Hi,
      Please verify you get the latest code from github. If you plan to use a single multicor emachine please configure using
      ./configure --no_jvm --no_mpi

      Best

      Delete
    2. it configured with ./configure --no_jvm

      but can i still use it in multinode multicore machine ?

      Delete
  4. Hi this webpage had details about how to use SVD with matlab matrices and running in parallel... can u tell where can I find the info ??

    ReplyDelete
  5. Hello, I have run my first svd of a matrix in graphlab,
    But is it possible to find the least singular value and the corresponding vector from the command line ?

    ReplyDelete
  6. Can you please tell me weather timesvd++ is implemented in graphlab v2 or not.

    ReplyDelete
    Replies
    1. Hi, It is implemented in GraphChi: https://github.com/GraphChi/graphchi-cpp/blob/master/toolkits/collaborative_filtering/timesvdpp.cpp
      Instructions are here: http://bickson.blogspot.co.il/2012/12/collaborative-filtering-with-graphchi.html

      Delete