Thursday, January 19, 2017

TensorFlow to support Keras API

I found this interesting blog post by Rachel Thomas. My favorite quote:

Using TensorFlow makes me feel like I’m not smart enough to use TensorFlow; whereas using Keras makes me feel like neural networks are easier than I realized. This is because TensorFlow’s API is verbose and confusing, and because Keras has the most thoughtfully designed, expressive API I’ve ever experienced. I was too embarrassed to publicly criticize TensorFlow after my first few frustrating interactions with it. It felt so clunky and unnatural, but surely this was my failing. However, Keras and Theano confirm my suspicions that tensors and neural networks don’t have to be so painful. 

Pipeline.io - production environment to serve TensorFlow models

I recently stumbled upon pipeline.io - an open source production environment to serve TensorFlow deep learning models. By looking into Giuhub activity plots I see the Chris Fregly is the main force behind it.
Pipeline.io is trying to solve the major headache around scoring and maintaining ML models in production.

Here us their general architecture diagram:

Here is a talk by Chris: 



Alternative related systems are seldon.io, prediction.io (sold to SalesForce), sense.io (sold to Cloudera), Domino Data Labs and probably some others I forgot :-)

BTW Chris will be giving a talk at AI by the bay conference (March 6-8 in San Francisco). The conference looks pretty interesting. 

Monday, January 16, 2017

CryptoNets: scoring deep learning on encrypted data

Last week I attended  an interesting lecture by Ran Gilad Bachrach from MSR. Ran presented CryptoNets who was reported in ICML 2016. CryptoNets allows to score trained deep learning models on encrypted data. They use homomorphic encryption a well known mechanism which allows computing encrypted products and sums. So the main trick is to limit the neural net operations to include only sums and products. To overcome this problem CryptoNet is using the square function as the only non-linear operation supported (vs. sigmoids, ReLU etc.)

On the up side, CryptoNets reports 99% accuracy on MNIST data which is the toy example everyone is using for deep learning. On the downside, you can not train a network but just score on new test data. Scoring is quite slow - around 5 minutes, although you can batch up to a few thousands scoring operations together at the same batch. Due to increasing complexity of the represented numbers the technique is also limited to a certain number of network layers.

I believe that in the coming few years additional research effort will be invested for trying to tackle the training of neural networks on private data without revealing the data contents.

Anyone who is interested in reading about other primitives who may be used for performing similar computation is welcome to take a look at my paper: D. Bickson, D. Dolev, G. Bezman and B. Pinkas Secure Multi-party Peer-to-Peer Numerical Computation. Proceedings of the 8th IEEE Peer-to-Peer Computing (P2P'08), Sept. 2008, Aachen, Germany - where we use both homomorphic encryption but also Shamir Secret Sharing to compute a similar distributed computation (in terms of sums and products).

Thursday, December 15, 2016

Neural networks for graphs

I met at Thomas Kipf from University of Amsterdam at NIPS 2016 and he pointed out some interesting blog post he wrote regarding neural networks for graph analytics.

Friday, June 24, 2016

Thursday, June 23, 2016

4th Large Scale Recommender Systems workshop - deadline extended

We have extended the deadline of our Large Scale Recommender Systems workshop to June 30. This is the 4th year we are organizing this workshop as part of ACM Recsys 2016. Anyone with novel work in the area of applied recommender systems is welcome to submit a talk proposal.

Sunday, June 19, 2016

GraphLab Create healthcare use case

A nice blog post from Mark Pinches, our Manchester evangelist who is working with John Snow  Labs. It shows how to use GraphLab Create for slicing, dicing and aggregations of healthcare data.