Hi Danny, I've seen some of your posts regarding lanczos and thought I'd ask you.
I have a problem where I need to compute the first few hundred eigenvectors/values in the PCA of a large, dense dataset, about ten thousand times five million. (It's an avi of a fluorescence microscopy experiment). For the larger datasets I might not even be able to hold the entire set in memory. The normal approach would be to compute the 10000^2 A*A' matrix and then diagonalize it, the problem being that this matrix entails 12 million dot products of 5M element vectors. So I was hoping to find a method that iteratively computes the eigenvectors without such explicit evaluations. Is Lanczos numerically stable enough for such sizes?
Thanks,
Marcelo
______________________________ ____________________________
Marcelo Magnasco Box 212, 1230 York Avenue, NY NY10065
Professor and Head, v +1 212 3278542 f +1 212 3277422
The incremental SVD papers that I am thinking of have been written by Matthew Brand. He has several papers on this topic, but the one that I have been using the most is this one: http://www.merl.com/
Unfortunately I am not aware of software package which implements the Brand method. You will probably have to implement it yourself.
Additional resource I found is another paper by Brand:
Brand, M.E., “Incremental Singular Value Decomposition of Uncertain Data with Missing Values”, European Conference on Computer Vision (ECCV), Vol 2350, pps 707-720, May 2002 (Lecture Notes in Computer Science) |
Or ask Matthew directly ?
ReplyDeleteI don't know him in person - do you know him?
ReplyDelete:-)
There is work by Gu and Eisenstat on incremental SVD.
ReplyDeleteNow regarding his problem:
1) He is really interested in computing the SVD of A, so no need to form A'*A (which is actually not the suggested way to compute the SVD of A). One can use dense SVD methods, and compute all singular values. I would suggest trying the package Elemental.
2) Since he wants only a few hundred singular values, iterative methods might work better. I would suggest taking a look at ARPACK (it computes eigenvalues, but you can compute the eigenvalus of [0 A'; A 0]).
3) If low accuracy is enough, one can try the randomized PCA methods. Just google those.