Large Scale Machine Learning and Other Animals: Incremental SVD

Saturday, April 27, 2013

Incremental SVD

Here is an email I got from Prof. Magnasco from Rockefeller University, NY:

Hi Danny, I've seen some of your posts regarding lanczos and thought I'd ask you.

I have a problem where I need to compute the first few hundred eigenvectors/values in the PCA of a large, dense dataset, about ten thousand times five million. (It's an avi of a fluorescence microscopy experiment). For the larger datasets I might not even be able to hold the entire set in memory. The normal approach would be to compute the 10000^2 A*A' matrix and then diagonalize it, the problem being that this matrix entails 12 million dot products of 5M element vectors. So I was hoping to find a method that iteratively computes the eigenvectors without such explicit evaluations. Is Lanczos numerically stable enough for such sizes?

Thanks,

Marcelo

__________________________________________________________

Marcelo Magnasco Box 212, 1230 York Avenue, NY NY10065

Professor and Head, v +1 212 3278542 f +1 212 3277422

Mathematical Physics Lab http://sur.rockefeller.edu/Plone

The Rockefeller University magnasco@rockefeller.edu

As a great coincidence, I just heard about the same problem solution from Byron Boots, a postdoc in UW:

The incremental SVD papers that I am thinking of have been written by Matthew Brand. He has several papers on this topic, but the one that I have been using the most is this one: http://www.merl.com/papers/docs/TR2006-059.pdf

Unfortunately I am not aware of software package which implements the Brand method. You will probably have to implement it yourself.

Additional resource I found is another paper by Brand:

Brand, M.E., “Incremental Singular Value Decomposition of Uncertain Data with Missing Values”, European Conference on Computer Vision (ECCV), Vol 2350, pps 707-720, May 2002 (Lecture Notes in Computer Science

)

Which has a better explanation of the setup.

3 comments:

IgorApril 29, 2013 at 4:12 PM
Or ask Matthew directly ?
ReplyDelete
Replies
Danny BicksonApril 29, 2013 at 11:10 PM
I don't know him in person - do you know him?
:-)
ReplyDelete
Replies
UnknownMay 3, 2013 at 4:56 PM
There is work by Gu and Eisenstat on incremental SVD.

Now regarding his problem:

1) He is really interested in computing the SVD of A, so no need to form A'*A (which is actually not the suggested way to compute the SVD of A). One can use dense SVD methods, and compute all singular values. I would suggest trying the package Elemental.

2) Since he wants only a few hundred singular values, iterative methods might work better. I would suggest taking a look at ARPACK (it computes eigenvalues, but you can compute the eigenvalus of [0 A'; A 0]).

3) If low accuracy is enough, one can try the randomized PCA methods. Just google those.
ReplyDelete
Replies

Add comment

Large Scale Machine Learning and Other Animals

Saturday, April 27, 2013

Incremental SVD

3 comments:

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax