This is a linear system of equations, where the matrix A has 1.2M rows x 1.2M columns and
8.4M non-zeros. Here are some performance numbers. For round-robin scheduler, running 70 iterations on a 32 core Opteron (2.4GHz) computer with 128GB of RAM with CentOS release 5.6.
We got the best speedup, around 12, using 24 cores:
For the chromatic scheduler, we got a slightly less good speedup of 8 using 20 cores.
In terms of running time, each iteration runs in about 0.2 seconds on 20 cores.
Which means we handle about 40M non-zero matrix entries a second.