Wednesday, January 15, 2014

Petuum - a new distributed machine learning framework from CMU (Eric Xing)

From their website:

Petuum is a distributed machine learning framework. It takes care of the difficult system "plumbing work", allowing you to focus on the ML. Petuum runs efficiently at scale on research clusters and cloud compute like Amazon EC2 and Google GCE.

A Bit More Details

Petuum provides essential distributed programming tools that minimize programmer effort. It has a distributed parameter server(key-value storage), a distributed task scheduler, and out-of-core (disk) storage for extremely large problems. Unlike general-purpose distributed programming platforms, Petuum is designed specifically for ML algorithms. This means that Petuum takes advantage of data correlation, staleness, and other statistical properties to maximize the performance for ML algorithms.

Plug and Play

Petuum comes with a fast and scalable parallel LASSO regression solver, as well as an implementation of topic model (Latent Dirichlet Allocation) and L2-norm Matrix Factorization - with more to be added on a regular basis. Petuum is fully self-contained, making installation a breeze - if you know how to use a Linux package manager and type "make", you're ready to use Petuum. No mucking around trying to find that Hadoop cluster, or (worse still) trying to install Hadoop yourself. Whether you have a single machine or an entire cluster, Petuum just works.

What's Petuum anyway?

Petuum comes from "perpetuum mobile," which is a musical style characterized by a continuous steady stream of notes. Paganini's Moto Perpetuo is an excellent example. It is our goal to build a system that runs efficiently and reliably -- in perpetual motion

No comments:

Post a Comment