Today I learned from Prof. Garth Gibson from CMU, about a graph algorithm benchmark called http://graph500.org . It is a benchmarking suite design for comparing performance of graph algorithms.
Current benchmarks and performance metrics do not provide useful information on the suitability of supercomputing systems for data intensive applications. A new set of benchmarks is needed in order to guide the design of hardware architectures and software systems intended to support such applications and to help procurements. Graph algorithms are a core part of many analytics workloads.
Backed by a steering committee of over 50 international HPC experts from academia, industry, and national laboratories, Graph 500 will establish a set of large-scale benchmarks for these applications. The Graph 500 steering committee is in the process of developing comprehensive benchmarks to address three application kernels: concurrent search, optimization (single source shortest path), and edge-oriented (maximal independent set).
They have very interesting and challenging data sizes:
The benchmark performs the following steps:
In comparison to our machine learning techniques, the Graph500 uses a completely useless and synthetic data:
The graph generator is a Kronecker generator similar to the Recursive MATrix (R-MAT) scale-free graph generation algorithm [Chakrabarti, et al., 2004].
Overall, it is a great initiative in the right direction, although we as ML researchers would love to see real data and real applications instead synthetic data. From the other hand, the magnitudes of data the parallel computing guys are handling are several order of magnitude larger than the ones we can fit in state of the art systems like GraphLab.