Michael did a lot of work in the field of random projection and I asked him some questions to learn some more about this field:

1) Can you provide some resources for novice people (like me) who likes to learn more on the technique of random projections?

Here are the links to the two reviews on randomized matrix algorithms that we discussed.

Randomized Algorithms for Matrices and Data, M. W. Mahoney,

NOW Publishers, Foundations and Trends in Machine Learning, Volume 3, Issue 2, 2011,

Technical Report, Preprint: arXiv:1104.5557 (2011).

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp

SIAM Review, Volume 53, Issue 2, pp. 217-288, 2011,

Technical Report, Preprint: arXiv:0909.4061 (2009).

The first of these reviews (mine) adopts a more data/TCS/ML approach for the least-squares and low-rank approximation problems and tries to highlight "why" the algorithms work with an eye to developing future methods more generally; while the latter review (HMT) focuses on coupling the novel randomized algorithms for low-rank matrix approximation with classical numerical methods, the goal being to get high-quality numerical code, typically for "scientific" applications. Thus, e.g., some of the assumptions that are made are more appropriate for scientific applications that "data analytics" applications---e.g., for them sparse matrices are typically structured, e.g., from pdes, and thus can be applied quickly as an operator to unstructured Gaussian random projections.

2) Are you familiar with the divide and conquer paper from last NIPS? ("Divide and Conquer Matrix Factorization" by Lester Mackey, Ameet Talwalkar and Michael Jordan)

I'm familiar with that paper on "divide and conquer," and I've talked a lot with Ameet Talwalkar about these issues. Their splitting of the data makes sense; but from the perspective of what we were discussing about analytics, this and other work of Ameet (and many others in ML) makes assumptions that the matrix has small "coherence," which is basically the largest "statistical leverage score." See my review monograph above, or see our PNAS paper on "CUR matrix decompositions for improved data analysis," or the paper below, for details. That assumption is fine in some cases, and Ameet has shown that it is true for at least a few ML data sets. But it is a very false assumption in many many other data sets, which the PNAS paper, the review monograph, and several other papers I can point you to discuss. More importantly, to develop robust large-scale analytic tools, you simply can't assume that, since it will often be false. A good discussion of these issues can also be found in our paper:

Fast approximation of matrix coherence and statistical leverage,

P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff,

Technical Report, Preprint: arXiv:1109.3843 (2011)

which also describes how to compute the coherence and statistical leverage scores "fast." Coupled with LSRN, the algorithm in this paper yields fast code to compute the statistical leverage scores and matrix coherence---very fast in the case of "rectangular" matrices, and "medium fast" in the case of large sparse matrices with a low-rank parameter, partly due to the issue mentioned above, partly due to numerical issues with iterative methods like Lanczos-based methods and partly due to the densification issue. The latter is going to be a problem in general, and there is no simple solution---we should talk in more detail about this if you like.

3) I heard about LRSN, your efficient parallel iterative least squares. Can you give us the essence of this construction?

ANSWER:

LSRN is a parallel iterative least squares solver developed by Meng,

Saunders, and Mahoney that is based on random projections and that takes

advantage of modern computer architectures is a novel way. The solver

computes the unique min-length solution to strongly over-determined or

under-determined least-squares systems of the form

$\min_{x \in \mathbb{R}^n} \|A x - b\|_2$. Thus, LSRN is similar to

recent implementations of randomized projection algorithms for the

least-squares problem; but it is more general and more robust, and it is

designed for larger-scale parallel or distributed environments. In

particular, all existing prior work assumes that $A$ has full rank; and,

for the iterative solvers, none of it comes with guarantees on the number

of iterations, independent of the spectrum of $A$. For LSRN, the matrix

$A \in \mathbb{R}^{m \times n}$ can be either full-rank or rank-deficient;

since $A$ is only involved in matrix-matrix and matrix-vector

multiplications, $A$ can be a dense matrix, a sparse matrix, or a linear

operator; and LSRN automatically speeds up on sparse matrices and with

fast linear operators. The preconditioning phase of LSRN consists of a

random normal projection, which is embarrassingly parallel, and a singular

value decomposition of a "small" matrix. Importantly, the preconditioned

system is well-conditioned, with a strong concentration result on the

extreme singular values, and hence that the number of iterations is fully

predictable (just as with direct solvers) when LSQR or the Chebyshev

semi-iterative method is applied to the preconditioned system. The latter

method is particularly efficient for solving large-scale problems on

distributed clusters with high communication cost. In addition, LSRN is

easy to implement using threads or with MPI, and it scales well in

distributed clusters and parallel environments. Numerical results

demonstrate that, on a local shared memory machine, LSRN outperforms

LAPACK's DGELSD on large-scale dense problems and MATLAB's backslash

(SuiteSparseQR) on large-scale sparse problems; and empirical results also

demonstrate that LSRN scales well on Amazon Elastic Compute Cloud (EC2)

clusters. LSRN is available for download , or here.

.thanks for sharing

ReplyDelete