I got this from my colleague
Krishna Sridhar. It seems that a new ML library,
spark.ml is being written on top of Spark with the goal of deprecating MLlib.
If all goes well, spark.ml
will become the primary ML package at the time of the Spark 1.3 release. Initially, simple wrappers will be used to port algorithms to spark.ml
, but eventually, code will be moved to spark.ml
and spark.mllib
will be deprecated.
I just got a note from
Xiangrui Meng, who is heading this effort. It seems the above text was not clear. Here is a clarification of their new plan:
spark.ml contains high-level APIs for building ML pipelines. But it doesn't mean that spark.mllib is being deprecated, nor MLlib as a Spark component is being deprecated. First of all, the spark.ml pipeline API is in its alpha stage and we need to see more use cases from the community to stabilizes it. Secondly, the components in spark.ml are simple wrappers over spark.mllib implementations. Neither the APIs nor the implementations from spark.mllib are being deprecated. We expect users use spark.ml pipeline APIs to build their ML pipelines, but we will keep supporting and adding features to spark.mllib. For example, you can find many features in review at https://spark-prs.appspot.com/#mllib. So users should be comfortable with using spark.mllib features and expect more features coming. I will update the user guide to make the message clear. Thanks for bringing this up!
Thanks for the mention, Danny! Love your work.
ReplyDeleteHere's an updated video, btw: https://youtu.be/swiPWUxBvSc
Here's the jupyter notebook that powers the entire demo: https://github.com/fluxcapacitor/pipeline/blob/master/jupyterhub.ml/notebooks/Conferences/StartupML/Jan-20-2017/SparkMLTensorflowAI-HybridCloud-ContinuousDeployment.ipynb
Thanks again!