Large Scale Machine Learning and Other Animals: Big Math Library on top of GraphLab/GraphChi

Wednesday, August 29, 2012

Big Math Library on top of GraphLab/GraphChi

Recently I am toying with a big math library around GraphLab / GraphChi with a "matlab" like interface that can scale to very large problems.

Today I finished implementing SVD (Lanczos) algorithm for both GraphLab/GraphChi.

Why do we need a big math library? If your problem is small and you can compute it quickly in Matlab than you are done. However, if the data is big, and Matlab can not even load the problem we are in trouble. That is why a scalable implementation is useful.

Let's start with a very simple example - Jacobi method for solving a linear system of equations. The code looks like this:

for (int i=0; i < max_iter; i++){

    x = (b - A*x)/A_ii;

}

A second example, is the alternating least squares (ALS) algorithm:

for (int i=0; i < max_iter; i++){

     V = A.backslash(U);

     U = A.backslash(V);

}

Another example, a little more fancy, is Lanczos iteration main loop:

    U[k] = V[k]*A._transpose();

    orthogonalize_vs_all(U, k, alpha(0));

    for (int i=k+1; i<n; i++){

      V[i]=U[i-1]*A;

      orthogonalize_vs_all(V, i, beta(i-k-1));

      U[i] = V[i]*A._transpose();

      orthogonalize_vs_all(U, i, alpha(i-k));

}

    V[n]= U[n-1]*A;

    orthogonalize_vs_all(V, n, beta(n-k-1));

In each iteration of Lanczos, the vector U[i] is left multiplied with the matrix A and then left multiplied with the matrix A'. Namely, U[i+1] = U[i]*A*A'.

What is nice about it, is the underlying implementation.
In GraphChi: the matrix A is stored on disk and read in parts to perform the multiplication. It is never read fully into memory. And thus we can scale to very large models.

In GraphLab: the matrix A is distributed across multiple machines so the product is a distributed operation. Also the vectors V[i] and U[i] are spanned across multiple machines.

The user who writes the high level code never needs to bother where is the data (on disk? in the network? ) and can write high level math primitives like in Matlab.

So where is the catch? I must admit there is a lot of ugly code which is needed to implement the high level functionality. However, the user does not have to actually worry about it and can enjoy the improved functionality.