## Monday, October 10, 2011

### Lanczos Algorithm for SVD (Singular Value Decomposition) in GraphLab

A while ago, I examined Mahout's parallel machine learning toolbox mailing list, and found out that a majority of the questions where about their Lanczos SVD solver. This is a bit surprising since Lanczos is one of the simplest algorithms out there. I thought about writing a post that will explain Lanczos once and for all.

The Lanczos method, is a simple method for finding the eigenvalues and eigenvectors of a square matrix. Here is matlab code that implements Lanczos:

```% Written by Danny Bickson, CMU
% Matlab code for running Lanczos algorithm (finding eigenvalues of a
% PSD matrix)

% Code available from: http://www.cs.cmu.edu/~bickson/gabp/
function [] = lanczos_ortho(A, m, debug)

[n,k] = size(A);
V = zeros(k,m+1);
if (nargin == 2)
V(:,2) = rand(k,1);
else
V(:,2)=  0.5*ones(k,1);
end
V(:,2)=V(:,2)/norm(V(:,2),2);
beta(2)=0;

for j=2:m+2

w = A*V(:,j);
w = A'*w;
w = w - beta(j)*V(:,j-1);
alpha(j) = w'*V(:,j);
w = w - alpha(j)*V(:,j);

%orthogonalize
for k=2:j-1
tmpalpha = w'*V(:,k);
w = w -tmpalpha*V(:,k);
end

beta(j+1) = norm(w,2);
V(:,j+1) = w/beta(j+1);
end
```
```% prepare the tridiagonal matrix
T = diag(alpha(2:end)) + diag(beta(3:end-1),1) + diag(beta(3:end-1),-1);
V = V(:,2:end-1);
disp(['approximation quality is: ', num2str(norm(V*T*V'-A'*A))]);
disp(['approximating eigenvalues are: ', num2str(sqrt(eig(full(T))'))]);
eigens = eig(full(T));
```
```[u,d]=eig(full(T));
V
disp(['ritz eigenvectors are: ']);
s=V*u
disp(['and u is: ']);
u
disp(['d is: ']);
d
for i=m+1:-1:1
disp(['residual for ', num2str(i), ' is: ', num2str(norm(A'*A*s(:,i) - eigens(i)*s(:,i)))]);
end
```

And here is an example run on a 3x3 matrix. Note that the number of requested iterations is 2, since we get 2+1 eigenvalues returned. (Number of iterations plus one).
```>> A=[12,6,4;6,167,24;4,24,-41];
>> lanczos_ortho(A,2)
approximation quality is: 5.5727e-12
approximating eigenvalues are: 11.93436      43.92815      169.9938

V =

0.9301   -0.1250    0.3453
0.0419    0.9703    0.2383
0.3648    0.2071   -0.9077

ritz eigenvectors are:

s =

0.9974    0.0590    0.0406
-0.0470    0.1112    0.9927
0.0541   -0.9920    0.1137

and u is:

u =

0.9455   -0.3024    0.1208
-0.1590   -0.1050    0.9817
0.2841    0.9474    0.1473

d is:

d =

1.0e+04 *

0.0142         0         0
0    0.1930         0
0         0    2.8898

residual for 3 is: 4.2901e-12
residual for 2 is: 3.2512e-11
residual for 1 is: 8.7777e-12
```

Handling non-square matrices
Now what happens when the matrix A is not square? This is very easy. Instead of decomposing A we will decompose A'A. In terms of efficiency, there is no need for computing A'A explicitly. We
can compute it iteratively by changing:
w = A*V(:,j) - beta(j)*V(:,j-1);
to
w = A*V(:,j); w = A'*w - beta(j)*V(:,j-1);

Computing SVD
A nice trick for computing SVD I learned from Abhay Harpale uses the following identity.
Assume [U,D,V] = SVD(A). Then
U = eigenvectors(A'A), V = eigenvectors(AA'), and D = sqrt(eigenvalues(A));

Here is a small example to show it (in Matlab/Octave):
```>> a = rand(3,2)

a =

0.1487    0.4926
0.2340    0.3961
0.8277    0.8400

>> [u,d,v] = svd(a)

u =

-0.3531   -0.8388   -0.4144
-0.3378   -0.2988    0.8925
-0.8725    0.4551   -0.1779

d =

1.3459         0
0    0.2354
0         0

v =

-0.6343    0.7731
-0.7731   -0.6343

>> [u1,i] = eig(a*a')

u1 =

0.4144    0.8388    0.3531
-0.8925    0.2988    0.3378
0.1779   -0.4551    0.8725

i =

0.0000         0         0
0    0.0554         0
0         0    1.8116

>> [v1,i] = eig(a'*a)

v1 =

-0.7731    0.6343
0.6343    0.7731

i =

0.0554         0
0    1.8116
```

GraphLab Implementation
SVD is now implemented in Graphlab as part of the collaborative filtering library.
You are welcome to run first the unit test:
./pmf --unittest=131
For performing SVD on a 5x12 matrix.

After you verify it runs fine, you are welcome to experiment with a larger matrix. See
http://graphlab.org/datasets.html for a larger NLP example using SVD.

The input to the algorithm is sparse matrix in GraphLab (or sprase matrix market) format as explained here. The output are three matrices: U, V, and T.
U and V as above, and T is a matrix of size (number of iterations +1 x 2) where the eigenvalues
of A'A are in the first column and the eigenvalues of AA' are in the second column. (Due to numerical errors, and the fact we run a limited number of iteration, there may be variations between two lists of Eigenvalues, although in theory they should be the same).

Implementation details
We implement Lanczos algorithm as described above, but on each iteration we compute in parallel both A'A and AA'.  The matrices are sliced into rows/columns and each row and column is computed in parallel using Graphlab threads.

Notes:
1) The number of received eigenvalues, is always number of iterations + 1.
2) Currently we do not support square matrices. (When inputing a square matrix we still compute A'A).
3) Unlike some common belief, algorithm works well for both small example and large matrices. So it is also desirable to try it out using small examples first.
4) It is recommended to run using itpp. Eigen version is still experimental at this point.
5) The output matrix T contains the square of the singular value. (And not the singular value itself).