Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng Large Scale Distributed Deep Networks, NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada, United States, December, 2012.
It uses a distributed implementation of SGD/LBGFS for training deep networks. It is one of the largest ML deployments I have seen so far: up to 10K cores, 5K machines. In a nutshell they factorize the problem into regions, run SGD in each region separately and then use a central server to merge the model from the different regions. They also support asynchronous computation of the different nodes.
And they did not fail to mention GraphLab :-)
We considered a number of existing large-scale computational tools for application to our problem, MapReduce  and GraphLab  being notable examples. We concluded that MapReduce, designed for parallel data processing, was ill-suited for the iterative computations inherent in deep network training; whereas GraphLab, designed for general (unstructured) graph computations, would not exploit computing efﬁciencies available in the structured graphs typically found in deep networks.
I am sure I got their meaning - if anyone knows let me know.