Yesterday I connected with Adam Gibson and Chris Nicholson from SkyMind, a new startup around support and maintenance of depplearning4j, one of the popular deep learning packages. To all the VCs who are reading this blog, please note that SkyMind is looking for funding.
What is SkyMind?
Skymind is the commercial support arm of Deeplearning4j, the distributed open-source deep-learning framework for the JVM. Adam Gibson created Deeplearning4j, and has spoken at Hadoop Summit, OSCon, Tech Planet and elsewhere. He's the author of the forthcoming O'Reilly book "Deep Learning: A practitioner's guide."
What is deeplearning4j license? what is SkyMind business model?
Deeplearning4j, ND4J (our scientific computing library) and Canova (vectorization) are Apache-2.0 licensed, which gives them IP protection on derivative works they create with our software.
Skymind builds "Google brains" for industry. Our software works cross-platform from server to desktop to mobile, and handles every major data type: image, sound, time series, text and video. What Red Hat is to Linux, we are to Deeplearning4j.
Which distributed systems does learning4j support? (Hadoop, Spark, Yarn?)
YARN,Spark. We also allow users to create standalone distributed systems using Akka and AWS.
Can GPU mode run distributed? Can you support multiple GPUs?
No Infiniband yet, but it can do internode coordination and leverage GPUs via ND4j.
From your experience: what is the typical speedup of GPU vs. CPU
We're finishing benchmarks now. Just implemented matrix caching and raw Cuda. Will know more numbers soon (plan to benchmark on gpu matrix caching with Spark).
What is the most powerful deep learning methods implemented in deeplearning4j? what are their typical use cases?
Sentiment analysis for text (which has applications for CRM and reputation management); image and facial recognition, which has wide consumer and security applications; sound/voice analysis, which is useful for speech-to-text and voice search; time series analysis, which is useful for predictive analytics and anomaly detection in finance, manufacturing and hardware.
What is your target user. Do I have to be a deep learning expert?
The entry-level data scientist who needs to productionize an algorithm focusing on unsructured data where traditional feature engineering methods have fallen over. Familiarity with machine learning ideas will help, but it's not necessary to get started. We introduce most of the crucial ideas on our website.
Which programming language interfaces do you support?
Java/Scala right now. We'll have a Bash command-line interface that loads models via JSON.
There are a few other deep learning libraries like Theano and Caffe. Can you outline the benefits of deeplearning4j (either in terms of accuracy or speed or distribution?)
Caffe was created by a PhD candidate at Berkeley. It specializes in machine vision and C/C++ based. Deeplearning4j is commercially supported, handles all data types (not just images), and is based on the JVM, which means it works easily cross platform.
Theano/PyLearn2 is written in Python and likewise serves the research community. It is useful for prototyping and widely used, but most people who create a working net in Python need to rewrite it for production. Deeplearning4j is production-grade from the get go.
Theano allows you to build your own nets but the generated gradients can be slow. Theano is also harder to get up and running cross platform. As for caffe, we integrate better:
Theano and Caffe are released under a BSD license that does not include a patent claim and retaliation clause, which means they do not offer the same protections as Apache 2.0.
What is the typical dataset size where you find deep learning to be effective. how many images?
You don't need very much data for deep learning as long as you tune it right (dropout, rectified linear units,..). It also depends on the problem you're solving. If you're training a joint distribution over images and text for example, you may want more. For simple classification, you can get away with a more tuned algorithm (aka more robust to over fitting).
How do you deal with classification of imbalanced classes?
We sample with replacement and random DropOut and DropConnect between layers to learn different features.
Besides of classifying images to labels. Can you identify object locations in images? Can you find similar images?
With enough data, yes.