Wednesday, January 28, 2015

Johns Hopkins ML Postdoc Position

I got this from my colleague Joshua Vogelstein:

The Open Connectome Project at Johns Hopkins University invites outstanding candidates to apply for a postdoctoral or assistant research scientist position in the area of statistical machine learning for big brain imaging data. Our workflow is tightly vertically integrated, ranging from raw data to theory to answering neuroscience questions and back again. Along the way, we develop new scalable methods (ideally with provable properties), and we apply previously developed methods in novel contexts. All of our projects include machine learning and big statistics, and integrate computer vision, systems engineering, numerical algorithms, and parallel computing. In short, we use/develop whatever technologies are necessary to answer today's most important, open, and long-standing questions in neuroscience.

The datasets that we work with are multi-modal, including multi-teravoxel images, high-dimensional spatiotemporal data, billion-vertex attributed graphs, 3D shapes, and semi-structured text. Therefore, we often focus on non-Euclidean and non-parametric methods. Publication targets include high-impact scientific journals, including Nature, Science, Nature Methods, and PNAS, with complementary articles in more specialized journals and conferences, including PAMI, NIPS, and Neuron.

Postdocs will primarily be advised by Dr. Joshua Vogelstein (Dept of Biomedical Engineering). In addition, postdocs will likely also be co-advised with at least one of Dr. Vogelstein's close collaborators, including Dr. Carey Priebe (Dept of Applied Mathematics & Statistics), Dr. Randal Burns (Dept of Computer Science), Dr. Guillermo Sapiro (Dept of Electrical and Computer Engineering, Duke University), and Dr. Michael Miller (Dept of Biomedical Engineering).

This position requires expertise in statistical machine learning and an interest in neuroscience. Other useful skills include computer vision, numerical algorithms, optimization theory, and convex analysis. Proficiency in some scientific programming language (e.g., R, Python, MATLAB) is also required. Experience with parallel computing and neuroscience are advantageous. All the research artifacts derived from this postdoc will be open source and open access. This means that pre-prints go on arxiv, code goes on github, and data goes on openconnecto.me, typically prior to publication.

To be considered, please send jovo@jhu.edu an email including: (i) a curriculum vita, and (ii) the names and email addresses of three references.

Friday, January 23, 2015

M$ Acquires Revolution Analytics

Heard from multiple people the news from today. Revolution is based on an open source model of support for R users.

Friday, January 16, 2015

University of Cambridge donates improved graph coloring code to PowerGraph

Philip Leonard, a student from the University of Cambridge has just donated improved graph coloring implementations to PowerGraph. Here is the tech report describing his project.
As we are soon going to release GraphLab Create as our newer open source repo we hope to get additional contributions to there as well!

Tuesday, January 13, 2015

Is MLLib being deprecated?

I got this from my colleague Krishna Sridhar. It seems that a new ML library, spark.ml is being written on top of Spark with the goal of deprecating MLlib.

If all goes well, spark.ml will become the primary ML package at the time of the Spark 1.3 release. Initially, simple wrappers will be used to port algorithms to spark.ml, but eventually, code will be moved to spark.ml and spark.mllib will be deprecated.
I just got a note from Xiangrui Meng, who is heading this effort. It seems the above text was not clear. Here is a clarification of their new plan:

spark.ml contains high-level APIs for building ML pipelines. But it doesn't mean that spark.mllib is being deprecated, nor MLlib as a Spark component is being deprecated. First of all, the spark.ml pipeline API is in its alpha stage and we need to see more use cases from the community to stabilizes it. Secondly, the components in spark.ml are simple wrappers over spark.mllib implementations. Neither the APIs nor the implementations from spark.mllib are being deprecated. We expect users use spark.ml pipeline APIs to build their ML pipelines, but we will keep supporting and adding features to spark.mllib. For example, you can find many features in review at https://spark-prs.appspot.com/#mllib. So users should be comfortable with using spark.mllib features and expect more features coming. I will update the user guide to make the message clear. Thanks for bringing this up! 

Thursday, January 8, 2015