Friday, January 30, 2015
GraphLab Hisotry O'Reilly Podcast
My friend Ben Lorica just released a postcast with our CEO Prof. Carlos Guestrin about GraphLab project history. I must admit I got some nice credits there. :-)
Wednesday, January 28, 2015
Johns Hopkins ML Postdoc Position
I got this from my colleague Joshua Vogelstein:
To be considered, please send jovo@jhu.edu an email including: (i) a curriculum vita, and (ii) the names and email addresses of three references.
The Open Connectome Project at Johns Hopkins University invites outstanding candidates to apply for a postdoctoral or assistant research scientist position in the area of statistical machine learning for big brain imaging data. Our workflow is tightly vertically integrated, ranging from raw data to theory to answering neuroscience questions and back again. Along the way, we develop new scalable methods (ideally with provable properties), and we apply previously developed methods in novel contexts. All of our projects include machine learning and big statistics, and integrate computer vision, systems engineering, numerical algorithms, and parallel computing. In short, we use/develop whatever technologies are necessary to answer today's most important, open, and long-standing questions in neuroscience.
The datasets that we work with are multi-modal, including multi-teravoxel images, high-dimensional spatiotemporal data, billion-vertex attributed graphs, 3D shapes, and semi-structured text. Therefore, we often focus on non-Euclidean and non-parametric methods. Publication targets include high-impact scientific journals, including Nature, Science, Nature Methods, and PNAS, with complementary articles in more specialized journals and conferences, including PAMI, NIPS, and Neuron.
Postdocs will primarily be advised by Dr. Joshua Vogelstein (Dept of Biomedical Engineering). In addition, postdocs will likely also be co-advised with at least one of Dr. Vogelstein's close collaborators, including Dr. Carey Priebe (Dept of Applied Mathematics & Statistics), Dr. Randal Burns (Dept of Computer Science), Dr. Guillermo Sapiro (Dept of Electrical and Computer Engineering, Duke University), and Dr. Michael Miller (Dept of Biomedical Engineering).
This position requires expertise in statistical machine learning and an interest in neuroscience. Other useful skills include computer vision, numerical algorithms, optimization theory, and convex analysis. Proficiency in some scientific programming language (e.g., R, Python, MATLAB) is also required. Experience with parallel computing and neuroscience are advantageous. All the research artifacts derived from this postdoc will be open source and open access. This means that pre-prints go on arxiv, code goes on github, and data goes on openconnecto.me, typically prior to publication.
To be considered, please send jovo@jhu.edu an email including: (i) a curriculum vita, and (ii) the names and email addresses of three references.
Friday, January 23, 2015
M$ Acquires Revolution Analytics
Heard from multiple people the news from today. Revolution is based on an open source model of support for R users.
Friday, January 16, 2015
University of Cambridge donates improved graph coloring code to PowerGraph

Tuesday, January 13, 2015
Is MLLib being deprecated?
I got this from my colleague Krishna Sridhar. It seems that a new ML library, spark.ml is being written on top of Spark with the goal of deprecating MLlib.
If all goes well,I just got a note from Xiangrui Meng, who is heading this effort. It seems the above text was not clear. Here is a clarification of their new plan:spark.ml
will become the primary ML package at the time of the Spark 1.3 release. Initially, simple wrappers will be used to port algorithms tospark.ml
, but eventually, code will be moved tospark.ml
andspark.mllib
will be deprecated.
spark.ml contains high-level APIs for building ML pipelines. But it doesn't mean that spark.mllib is being deprecated, nor MLlib as a Spark component is being deprecated. First of all, the spark.ml pipeline API is in its alpha stage and we need to see more use cases from the community to stabilizes it. Secondly, the components in spark.ml are simple wrappers over spark.mllib implementations. Neither the APIs nor the implementations from spark.mllib are being deprecated. We expect users use spark.ml pipeline APIs to build their ML pipelines, but we will keep supporting and adding features to spark.mllib. For example, you can find many features in review at https://spark-prs.appspot.com/#mllib. So users should be comfortable with using spark.mllib features and expect more features coming. I will update the user guide to make the message clear. Thanks for bringing this up!
Thursday, January 8, 2015
Velox: predictive services on top of Spark
My friend and colleague Joey Gonzalez has present Velox at CIDR. Velox is a predictive server which can serve machine learning on top of Spark.
A competing project is prediction.io.
A competing project is prediction.io.
Machine Learning Summer 2015 Internship Positions
My friend and colleague Jan Neumann from Comcast is looking for 5 summer interns for Comcast DC Research Lab. There are also 2 full time positions for ML/NLP researchers.
Tuesday, January 6, 2015
O'Reilly Strata Survey: Technology and Salaries
My colleague Alice Zheng sent me the following link to Strata salary and technology survey 2014.
Here are some interesting plots taken out of this survey.
Monday, January 5, 2015
GraphLab Create wins 2nd place ACM RecSys Competition
A couple of months ago, I wrote about the Hugarian team headed by Robi Palovic, who won the 2nd place at the ACM Recommender system conference. Yesterday we released an IPython notebook which documents the winning solution. Everyone is welcome to take a look!!
Sunday, January 4, 2015
Pokemon or Big Data Technology?
I got this from my Colleague Jay Gu: can you identify Pokemon character names from big data technology companies?
Bloomberg Beta is trying to map the ML doman
I got this from my colleague and friend Roy Varshavsky: An interesting effort from Bloomberg Beta - Shivon Zilis to chart ML related startups
Some comments:
1) Most important - our GraphLab logo is very old.. :-(
2) Hexdata changed their name to h2o.ai
3) Sales - SalesPredict, C9 inc
4) Fraud - Paypal, Square, Forter
5) Security- a more sexy name is cyber, there are so many of those there is a need for a new chart just for them
6) Marketing - Datorama
7) Agriculture - Nrgene, Inteliscope
8) Medical - Treato
9) Automative - Automatic
10) Non profit - Rootclaim
11) Oil & Gas - SparkBeyond
12) Media - Taboola
13) medical - Orcam
14) Consumer finance - eToro, Seeking Alpha
15) Image recognition - Cortica, Superfish
Saturday, January 3, 2015
1st Big Data Analytics Israeli Conference
Graphlab is involved in organizing the 1st Big Data Analytics Israeli conference. The conference will take place on May 11, at Wahl Center near Bar Ilan University, Israel. It is a 800 person conference targeted for data scientists and CTOs for exposing the Israeli innovation around big data analaytics and applied machine learning.
Special Keynote speaker
Ben Lorica, Chief Scientist of O'Reilly Media and the content manager of O'Reilly Strata
Ben Lorica, Chief Scientist of O'Reilly Media and the content manager of O'Reilly Strata
Subscribe to:
Comments (Atom)