Thursday, January 19, 2017 - production environment to serve TensorFlow models

I recently stumbled upon - an open source production environment to serve TensorFlow deep learning models. By looking into Giuhub activity plots I see the Chris Fregly is the main force behind it. is trying to solve the major headache around scoring and maintaining ML models in production.

Here us their general architecture diagram:

Here is a talk by Chris: 

Alternative related systems are, (sold to SalesForce), (sold to Cloudera), Domino Data Labs and probably some others I forgot :-)

BTW Chris will be giving a talk at AI by the bay conference (March 6-8 in San Francisco). The conference looks pretty interesting. 

And here is a note I got from Chris following my initial blog post:

Thanks for the mention, Danny! Love your work.

Here's an updated video:

Here's the jupyter notebook that powers the entire demo: 

I asked Chris which streaming applications he has in mind and this is what I got:

We've got a number of streaming-related Github issues (features) in the works: here are the some relevant projects that are in the works: - working with the Subscriber-Growth Team @ Netflix to replace their existing multi-armed bandit, Spark-Streaming-based data pipeline to select the best model to increase signups. we're using Kafka + Kafka Streams + Spark + Cassandra (they love Cassandra!) + Jupyter/Zeppelin Notebooks in both Python/Scala. - working with the Platform Team @ Twilio to quickly detect application logs that potentially violate Privacy Policies. this is already an issue outside the US, but quickly becoming an issue here in the US. we're using Kafka + custom Kafka Input Readers for Tensorflow + Tensorflow to train the models (batch) and score every log line (real-time). - working with a super-large Oil & Gas company out of Houston/Oslo (stupid NDA's) to continuously train, deploy, and compare scikit-learn and Spark ML models on live data in parallel - all from a Jupyter notebook. - working with PagerDuty to predict potential outages based on their new "Event" stream which includes code deploys, configuration changes, etc. we're using Kafka + the new Spark 2.0 Structure Streaming.  

What are the main benefits of vs. other systems? - the overall goal, as you can probably figure out, is to give data scientists the "freedom and responsibility" (hello, Netflix Culture Deck!) to iterate quickly without depending on production engineers or an ops group. - this is a life style that i really embraced while at Netflix. with proper tooling, anyone (devs, data scientists, etc) should be able to deploy, scale, and rollback their own code or model artifacts. - we're providing the platform for this ML/AI-focused freedom and responsibility! - you pointed out a few of our key competitors/cooperators like i have a list of about 20 more that i keep an eye on each and every day. i'm in close talks with all of them. - we're looking to partner with guys like Domino Data Labs who have a weak deployment story. - and we're constantly sharing experience and code with and and others. - we're super performance-focused, as well. we have a couple efforts going on including PMML optimization, native code generation, etc. - also super-focused on metrics and monitoring - including production-deployment dashboards targeted to data scientists. - i feel like our main competitors are actually the cloud providers. they're the ones that keep me awake. one of our underlying themes is to reverse engineer Google and AWS's Cloud ML APIs.  

1 comment:

  1. Thank you, very interesting. I don't think the big players are particularly a problem, people looking for solutions are trying to avoid the big player lock trap.

    Chris mentions a list with 20 names: do you know what some names on that list might be, besides those you mention?