## Friday, December 16, 2011

### News from NIPS - Divide and Conquer Matrix Factorization and efficient K-means

Here are some interesting papers that caught my eyes at NIPS. While being very simple I believe they can be very effective in practice.

"Divide and Conquer Matrix Factorization" is a joint work by Lester Mackey, Ameet Talwalkar and Michael Jordan from Berkeley. The idea is pretty simple. Assume you have a large matrix you can not fit into memory you would like to factorize to two low rank matrices. There are 3 steps in the construction:

1. The large matrix is first partitioned into several small matrices by slicing the original matrix to column groups.
2. Each smaller matrix is solved in parallel using a cluster of computers, using any desired matrix factorization algorithm.
3. Now it remains how to aggregate the result back for getting the original factorized solutions of the larger matrix. This can be done by column projection, namely computing the least square solution of the smaller matrix onto the larger one. (See details in the paper!) and averaging the different solutions together.
It seems this construction has a pretty descent speedup. I am considering implementing this method in GraphLab as well. Ping me if you are interested in using this algo.

The second paper, Fast and Accurate K-means for Large Datasets is a method for speeding up k-means computation by Michael Shindler Alex Wong andAdam Meyerson. It got the best student paper award. The idea is pretty simple and actually reminds me the power iteration clustering.

The observation at the basis of this construction is that one of the heavy computational burdens in K-means is computing distances from each data point to all K-cluster heads. A very simple trick is proposed: a random vector is selected and the data point as well of all cluster locations are projected into it. Now we got a list of scalars for each cluster and it remains to find out which is closest to the projected data point. This significantly saves work since instead of computing D-dimensional distances we compute only 1-dimensional distances. It seems that in practice accuracy of the method is not significantly worse.