Friday, January 30, 2015

GraphLab Hisotry O'Reilly Podcast

My friend Ben Lorica just released a postcast with our CEO Prof. Carlos Guestrin about GraphLab project history. I must admit I got some nice credits there. :-)

Wednesday, January 28, 2015

Johns Hopkins ML Postdoc Position

I got this from my colleague Joshua Vogelstein:

The Open Connectome Project at Johns Hopkins University invites outstanding candidates to apply for a postdoctoral or assistant research scientist position in the area of statistical machine learning for big brain imaging data. Our workflow is tightly vertically integrated, ranging from raw data to theory to answering neuroscience questions and back again. Along the way, we develop new scalable methods (ideally with provable properties), and we apply previously developed methods in novel contexts. All of our projects include machine learning and big statistics, and integrate computer vision, systems engineering, numerical algorithms, and parallel computing. In short, we use/develop whatever technologies are necessary to answer today's most important, open, and long-standing questions in neuroscience.

The datasets that we work with are multi-modal, including multi-teravoxel images, high-dimensional spatiotemporal data, billion-vertex attributed graphs, 3D shapes, and semi-structured text. Therefore, we often focus on non-Euclidean and non-parametric methods. Publication targets include high-impact scientific journals, including Nature, Science, Nature Methods, and PNAS, with complementary articles in more specialized journals and conferences, including PAMI, NIPS, and Neuron.

Postdocs will primarily be advised by Dr. Joshua Vogelstein (Dept of Biomedical Engineering). In addition, postdocs will likely also be co-advised with at least one of Dr. Vogelstein's close collaborators, including Dr. Carey Priebe (Dept of Applied Mathematics & Statistics), Dr. Randal Burns (Dept of Computer Science), Dr. Guillermo Sapiro (Dept of Electrical and Computer Engineering, Duke University), and Dr. Michael Miller (Dept of Biomedical Engineering).

This position requires expertise in statistical machine learning and an interest in neuroscience. Other useful skills include computer vision, numerical algorithms, optimization theory, and convex analysis. Proficiency in some scientific programming language (e.g., R, Python, MATLAB) is also required. Experience with parallel computing and neuroscience are advantageous. All the research artifacts derived from this postdoc will be open source and open access. This means that pre-prints go on arxiv, code goes on github, and data goes on, typically prior to publication.

To be considered, please send an email including: (i) a curriculum vita, and (ii) the names and email addresses of three references.

Friday, January 23, 2015

M$ Acquires Revolution Analytics

Heard from multiple people the news from today. Revolution is based on an open source model of support for R users.

Friday, January 16, 2015

University of Cambridge donates improved graph coloring code to PowerGraph

Philip Leonard, a student from the University of Cambridge has just donated improved graph coloring implementations to PowerGraph. Here is the tech report describing his project.
As we are soon going to release GraphLab Create as our newer open source repo we hope to get additional contributions to there as well!

Tuesday, January 13, 2015

Is MLLib being deprecated?

I got this from my colleague Krishna Sridhar. It seems that a new ML library, is being written on top of Spark with the goal of deprecating MLlib.

If all goes well, will become the primary ML package at the time of the Spark 1.3 release. Initially, simple wrappers will be used to port algorithms to, but eventually, code will be moved to and spark.mllib will be deprecated.
I just got a note from Xiangrui Meng, who is heading this effort. It seems the above text was not clear. Here is a clarification of their new plan: contains high-level APIs for building ML pipelines. But it doesn't mean that spark.mllib is being deprecated, nor MLlib as a Spark component is being deprecated. First of all, the pipeline API is in its alpha stage and we need to see more use cases from the community to stabilizes it. Secondly, the components in are simple wrappers over spark.mllib implementations. Neither the APIs nor the implementations from spark.mllib are being deprecated. We expect users use pipeline APIs to build their ML pipelines, but we will keep supporting and adding features to spark.mllib. For example, you can find many features in review at So users should be comfortable with using spark.mllib features and expect more features coming. I will update the user guide to make the message clear. Thanks for bringing this up! 

Thursday, January 8, 2015

Tuesday, January 6, 2015

O'Reilly Strata Survey: Technology and Salaries

My colleague Alice Zheng sent me the following link to Strata salary and technology survey 2014.

Here are some interesting plots taken out of this survey.

Monday, January 5, 2015

GraphLab Create wins 2nd place ACM RecSys Competition

A couple of months ago, I wrote about the Hugarian team headed by Robi Palovic, who won the 2nd place at the ACM Recommender system conference. Yesterday we released an IPython notebook which documents the winning solution. Everyone is welcome to take a look!!

Sunday, January 4, 2015

Pokemon or Big Data Technology?

I got this from my Colleague Jay Gu: can you identify Pokemon character names from big data technology companies?

Bloomberg Beta is trying to map the ML doman

I got this from my colleague and friend Roy VarshavskyAn interesting effort from Bloomberg Beta - Shivon Zilis to chart ML related startups

Some comments:
1) Most important - our GraphLab logo is very old..  :-(
2) Hexdata changed their name to
3) Sales - SalesPredict, C9 inc
4) Fraud - Paypal, Square, Forter
5) Security- a more sexy name is cyber, there are so many of those there is a need for a new chart just for them
6) Marketing - Datorama
7) Agriculture - Nrgene, Inteliscope
8) Medical - Treato
9) Automative - Automatic
10) Non profit - Rootclaim
11) Oil & Gas - SparkBeyond
12) Media - Taboola
13) medical - Orcam
14) Consumer finance - eToro, Seeking Alpha
15) Image recognition - Cortica, Superfish

Saturday, January 3, 2015

1st Big Data Analytics Israeli Conference

Graphlab is involved in organizing the 1st Big Data Analytics Israeli conference. The conference will take place on May 11, at Wahl Center near Bar Ilan University, Israel. It is a 800 person conference targeted for data scientists and CTOs for exposing the Israeli innovation around big data analaytics and applied machine learning.

We are looking for additional companies/ speakers and sponsors. Contact me if you like to be involved.  Additional details are here.