Monday, December 14, 2015

A quick introduction to speech recognition and natural language processing with deep learning

I asked my colleague and friend Yishay Carmiel, head of Spoken Innovation Labs to give me a quick training about state of the art speech recognition and NLP for deep learning. 

Speech Recognition

Speech recognition was the first application DL made a serious impact; the current state of the art approach in speech recognition is called CTC - "Connectionist Temporal Classification”.  This approach is an end to end neural network that handles both the state classification and the temporal cases, where the HMM was used.  In addition bidirectional LSTM are very hot topic.

In terms of SW implementation, we can see all the big companies are building a personal assistant; its core technology is using speech recognition. Google Now, Microsoft's Cortana, Apple’s Siri and Amazon’s Echo.

Speech recognition is a field that has been in research for more than 40 years. Building a Speech recognition system is a huge algorithmic and engineering task. It is very hard to point on 3 specific research papers that can cover the whole topic. A nice paper I can refer to is "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups", this is a research paper from 2012 published by 4 different research groups on the huge impact DL has made on speech recognition. Currently this is “old stuff” the technology is moving forward in a blazing paste. I also think that a book by top Microsoft Researchers “DEEP LEARNING: Methods and Applications” is a good place to understand what’s going on.

I do not know on any good video lectures on deep learning for voice, there might be, but to be honest I have not looked for that for a long time.

There are 2 well known open sources in speech recognition:

(i)             Sphinx – Open source by CMU, quite easy to work with and start implementing speech recognition. However as far as I know, the downside is that is does not have Deep Learning support – Only GMM based models.

(ii)           Kaldi – By far the most advanced open source in this field. Have al the latest technologies including: State of the art Deep learning models, WFST based search, advanced language model building techniques and latest speaker adaption techniques. Kaldi is not a plug and play program, it takes a lot of time to have a good understanding of how to use it and adapt it to your needs.

Natural Language Processing:

Natural Language Processing is a broad field with a lot of applications. So it is hard to point on a specific DL approach. Right now word representation and document/sentence representation using RNN are a secret sauce for building better models. In addition, a lot of NLP tasks are based on some kind of a sequence to sequence mapping so LSTM techniques give a nice boost to that. I also think that memory networks would have an interesting impact in the future.

DL is a tool to bring better NLP technologies, so I assume the big companies are applying these techniques to improve their product quality. A good example will be Google semantic search and IBM Watson.

To understand the impact I would refer to “Distributed Representations of Words and Phrases and their Compositionality” to understand the impact of word2vec and “Sequence to Sequence Learning with Neural Networks” to understand the impact of LSTM based models.

 There are very good video lectures by Stanford: “CS224d: Deep Learning for Natural Language Processing”. This gives a good coverage of the field.

Since NLP is a broad field with variety of application, its very hard to point for a single source. I think that Google’s TensorFlow offers a variety of interesting stuff, although it is not easy to work with. For word/document representation there is Google’s word2vec code, gensim and Stanford's Glove.

Yishay Carmiel Short Bio Yishay is the head of Spoken Labs, a big data analytics unit that implements bleeding edge deep learning and machine learning technologies for speech recognition, computer vision and data analysis. He has 15 years' experience as an algorithm scientist and technology leader and has worked on building large scale machine learning algorithms and served as a deep learning expert. Yishay and his team are working on bleeding edge technologies in artificial intelligence, deep learning and large scale data analysis.

No comments:

Post a Comment