I had a very interesting meeting today with Matei Zaharia, a graduate student from Berkeley with a very impressive publication record. He is one of the the authors of Spark, a parallel cloud architecture targeted for iterative and interactive algorithms, where Hadoop does not perform well.
I was impressed with the demo of Spark, using 20 Amazon EC2 machines. Spark is implemented using Scala, which allows for convenient Java like programming interface.
What caught my attention, is that on top of Spark, one of the implemented algorithms was alternating least squares. A work to be reported in the coming Sigcomm:
M. Chowdhury, M. Zaharia, J. Ma, M.I. Jordan and I. Stoica, Managing Data Transfers in Computer Clusters with Orchestra.
Here are some performance number given by Matei:
4 cores (master + 0 workers): 272s
12 cores (master + 1 worker): 106s
20 cores (master + 2 workers): 72s
28 cores (master + 3 workers): 57s
36 cores (master + 4 workers): 50s
Next, I tested multicore GraphLab using the same data, and same factorized matrix width (D=60). One iteration in GraphLab of alternating least squares takes 106 seconds on 8 cores, using an AMD Opteron multicore machine. This is very close
to Spark results with 12 cores.
Overall conclusion (by Matei:) I think the main takeaway from this is that you should absolutely use a lot of cores on one machine if you have them, as communication is much faster. When you add nodes, the communication cost will lower the overall performance-per-CPU, but you will get lower response times too.