Exploiting LSTM Structure in Deep Neural Networks for Speech Recognition

Tianxing He; Jasha Droppo

Exploiting LSTM Structure in Deep Neural Networks for Speech Recognition

Tianxing He ,
Jasha Droppo

ICASSP 2016 | March 2016

Published by IEEE

Download BibTex

The CD-DNN-HMM system has became the state-of-art system for large vocabulary continuous speech recognition (LVCSR) tasks, in which deep neural networks (DNN) plays a key role. However, DNN training suffers from the vanishing gradient problem, limiting training of deep models. In this work, we address this problem by incorporating the successful long-short term memory (LSTM) structure, which has been proposed to help recurrent neural network (RNN) to remember long term dependencies, into DNN. Also, we propose a generalized formulation of the LSTM block, which we name general LSTM(GLSTM). In our experiments, it is shown that our proposed (G)LSTM-DNN scales well with more layers, and achieves 8.2% relative word error rate reduction on the 2000-hour Switchboard data set.