## Monday, October 10, 2011

### Lanczos Algorithm for SVD (Singular Value Decomposition) in GraphLab

A while ago, I examined Mahout's parallel machine learning toolbox mailing list, and found out that a majority of the questions where about their Lanczos SVD solver. This is a bit surprising since Lanczos is one of the simplest algorithms out there. I thought about writing a post that will explain Lanczos once and for all.

The Lanczos method, is a simple method for finding the eigenvalues and eigenvectors of a square matrix. Here is matlab code that implements Lanczos:

```% Written by Danny Bickson, CMU
% Matlab code for running Lanczos algorithm (finding eigenvalues of a
% PSD matrix)

% Code available from: http://www.cs.cmu.edu/~bickson/gabp/
function [] = lanczos_ortho(A, m, debug)

[n,k] = size(A);
V = zeros(k,m+1);
if (nargin == 2)
V(:,2) = rand(k,1);
else
V(:,2)=  0.5*ones(k,1);
end
V(:,2)=V(:,2)/norm(V(:,2),2);
beta(2)=0;

for j=2:m+2

w = A*V(:,j);
w = A'*w;
w = w - beta(j)*V(:,j-1);
alpha(j) = w'*V(:,j);
w = w - alpha(j)*V(:,j);

%orthogonalize
for k=2:j-1
tmpalpha = w'*V(:,k);
w = w -tmpalpha*V(:,k);
end

beta(j+1) = norm(w,2);
V(:,j+1) = w/beta(j+1);
end
```
```% prepare the tridiagonal matrix
T = diag(alpha(2:end)) + diag(beta(3:end-1),1) + diag(beta(3:end-1),-1);
V = V(:,2:end-1);
disp(['approximation quality is: ', num2str(norm(V*T*V'-A'*A))]);
disp(['approximating eigenvalues are: ', num2str(sqrt(eig(full(T))'))]);
eigens = eig(full(T));
```
```[u,d]=eig(full(T));
V
disp(['ritz eigenvectors are: ']);
s=V*u
disp(['and u is: ']);
u
disp(['d is: ']);
d
for i=m+1:-1:1
disp(['residual for ', num2str(i), ' is: ', num2str(norm(A'*A*s(:,i) - eigens(i)*s(:,i)))]);
end
```

And here is an example run on a 3x3 matrix. Note that the number of requested iterations is 2, since we get 2+1 eigenvalues returned. (Number of iterations plus one).
```>> A=[12,6,4;6,167,24;4,24,-41];
>> lanczos_ortho(A,2)
approximation quality is: 5.5727e-12
approximating eigenvalues are: 11.93436      43.92815      169.9938

V =

0.9301   -0.1250    0.3453
0.0419    0.9703    0.2383
0.3648    0.2071   -0.9077

ritz eigenvectors are:

s =

0.9974    0.0590    0.0406
-0.0470    0.1112    0.9927
0.0541   -0.9920    0.1137

and u is:

u =

0.9455   -0.3024    0.1208
-0.1590   -0.1050    0.9817
0.2841    0.9474    0.1473

d is:

d =

1.0e+04 *

0.0142         0         0
0    0.1930         0
0         0    2.8898

residual for 3 is: 4.2901e-12
residual for 2 is: 3.2512e-11
residual for 1 is: 8.7777e-12
```

Handling non-square matrices
Now what happens when the matrix A is not square? This is very easy. Instead of decomposing A we will decompose A'A. In terms of efficiency, there is no need for computing A'A explicitly. We
can compute it iteratively by changing:
w = A*V(:,j) - beta(j)*V(:,j-1);
to
w = A*V(:,j); w = A'*w - beta(j)*V(:,j-1);

Computing SVD
A nice trick for computing SVD I learned from Abhay Harpale uses the following identity.
Assume [U,D,V] = SVD(A). Then
U = eigenvectors(A'A), V = eigenvectors(AA'), and D = sqrt(eigenvalues(A));

Here is a small example to show it (in Matlab/Octave):
```>> a = rand(3,2)

a =

0.1487    0.4926
0.2340    0.3961
0.8277    0.8400

>> [u,d,v] = svd(a)

u =

-0.3531   -0.8388   -0.4144
-0.3378   -0.2988    0.8925
-0.8725    0.4551   -0.1779

d =

1.3459         0
0    0.2354
0         0

v =

-0.6343    0.7731
-0.7731   -0.6343

>> [u1,i] = eig(a*a')

u1 =

0.4144    0.8388    0.3531
-0.8925    0.2988    0.3378
0.1779   -0.4551    0.8725

i =

0.0000         0         0
0    0.0554         0
0         0    1.8116

>> [v1,i] = eig(a'*a)

v1 =

-0.7731    0.6343
0.6343    0.7731

i =

0.0554         0
0    1.8116
```

GraphLab Implementation
SVD is now implemented in Graphlab as part of the collaborative filtering library.
You are welcome to run first the unit test:
./pmf --unittest=131
For performing SVD on a 5x12 matrix.

After you verify it runs fine, you are welcome to experiment with a larger matrix. See
http://graphlab.org/datasets.html for a larger NLP example using SVD.

The input to the algorithm is sparse matrix in GraphLab (or sprase matrix market) format as explained here. The output are three matrices: U, V, and T.
U and V as above, and T is a matrix of size (number of iterations +1 x 2) where the eigenvalues
of A'A are in the first column and the eigenvalues of AA' are in the second column. (Due to numerical errors, and the fact we run a limited number of iteration, there may be variations between two lists of Eigenvalues, although in theory they should be the same).

Implementation details
We implement Lanczos algorithm as described above, but on each iteration we compute in parallel both A'A and AA'.  The matrices are sliced into rows/columns and each row and column is computed in parallel using Graphlab threads.

Notes:
1) The number of received eigenvalues, is always number of iterations + 1.
2) Currently we do not support square matrices. (When inputing a square matrix we still compute A'A).
3) Unlike some common belief, algorithm works well for both small example and large matrices. So it is also desirable to try it out using small examples first.
4) It is recommended to run using itpp. Eigen version is still experimental at this point.
5) The output matrix T contains the square of the singular value. (And not the singular value itself).

1. Hi Danny,
The lanczos mathlab code you posted decomposes a given matrix A into 2 matrices V and T, where the matrix T is a tri-diagonal matrix whose eigenvalues are the same as the eigenvalues of the original matrix A. If [m,k]=size(A), then the tri-diagonal matrix T has a dimension m+1, and consequently it has one eigenvalue more than the original matrix A. How do you figure out which eigenvalue form the eig(T) to drop to ensure that eig(T)=eig(A)?

2. Hi,
The matrix T is a tridiagonal matrix with eigenvalues matching to eig(A'*A) and not eig(A).
If you like to compute the singular values used in SVD, you need to compute the square root of the eigenvalues. Note that the number of eigenvalues is one plus the number of iterations, so you should constrain your iterations depending on the number of non zero eigenvalues (namely min(m,k) - 1).
In terms of numerical accuracy, it is also desired to orthogonilize all vectors V vs. previous rounds. The code given in my blog post is simplified for educational purposes. I am soon releasing in GraphLab v2 an improved Lanczos implementation. See discussions about numerical accuracy in a mail I got from Joel Tropp:

In short, the Lanczos method is very unstable because it tries to "cheat" when it computes an orthogonal basis for the Krylov subspace. This procedure only works well in exact arithmetic; it breaks very quickly in the presence of floating-point errors. Drastic remedies are required to fix this problem. See the book by Golub and Van Loan for a discussion and references. A more introductory presentation appears in Trefethen and Bai.

Another possible approach to computing a large-scale SVD is to use column-sampling approaches, perhaps followed by a randomized SVD as advocated by Kwok and Li. See Chiu and Demanet (arXiv) or Gittens (arXiv) for some related methods.