当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Advanced recurrent network-based hybrid acoustic models for low resource speech recognition
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2018-07-17 , DOI: 10.1186/s13636-018-0128-6
Jian Kang , Wei-Qiang Zhang , Wei-Wei Liu , Jia Liu , Michael T. Johnson

Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However, the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term memory RNNs (LSTM RNNs) have been proposed to solve this problem and have achieved excellent results. Bidirectional LSTM (BLSTM), which uses both preceding and following context, has shown particularly good performance. However, the computational requirements of BLSTM approaches are quite heavy, even when implemented efficiently with GPU-based high performance computers. In addition, because the output of LSTM units is bounded, there is often still a vanishing gradient issue over multiple layers. The large size of LSTM networks makes them susceptible to overfitting problems. In this work, we combine local bidirectional architecture, a new recurrent unit, gated recurrent units (GRU), and residual architectures to address the above problems. Experiments are conducted on the benchmark datasets released under the IARPA Babel Program. The proposed models achieve 3 to 10% relative improvements over their corresponding DNN or LSTM baselines across seven language collections. In addition, the new models accelerate learning speed by a factor of more than 1.6 compared to conventional BLSTM models. By using these approaches, we achieve good results in the IARPA Babel Program.

中文翻译:

用于低资源语音识别的高级循环网络混合声学模型

循环神经网络 (RNN) 已显示出对时间依赖性进行建模的能力。然而,梯度爆炸或消失的问题限制了它们的应用。近年来,长短期记忆RNNs(LSTM RNNs)被提出来解决这个问题并取得了很好的效果。双向 LSTM (BLSTM) 使用前后上下文,表现出特别好的性能。然而,BLSTM 方法的计算要求非常高,即使使用基于 GPU 的高性能计算机有效实现也是如此。此外,由于 LSTM 单元的输出是有界的,因此在多层上经常仍然存在梯度消失的问题。LSTM 网络的大尺寸使它们容易出现过拟合问题。在这项工作中,我们结合了本地双向架构,一个新的循环单元、门控循环单元(GRU)和残差架构来解决上述问题。实验是在 IARPA Babel 计划下发布的基准数据集上进行的。所提出的模型在七种语言集合中相对于其相应的 DNN 或 LSTM 基线实现了 3% 到 10% 的相对改进。此外,与传统的 BLSTM 模型相比,新模型将学习速度提高了 1.6 倍以上。通过使用这些方法,我们在 IARPA Babel 计划中取得了不错的成绩。所提出的模型在七个语言集合中相对于其相应的 DNN 或 LSTM 基线实现了 3% 到 10% 的相对改进。此外,与传统的 BLSTM 模型相比,新模型将学习速度提高了 1.6 倍以上。通过使用这些方法,我们在 IARPA Babel 计划中取得了不错的成绩。所提出的模型在七个语言集合中相对于其相应的 DNN 或 LSTM 基线实现了 3% 到 10% 的相对改进。此外,与传统的 BLSTM 模型相比,新模型将学习速度提高了 1.6 倍以上。通过使用这些方法,我们在 IARPA Babel 计划中取得了不错的成绩。
更新日期:2018-07-17
down
wechat
bug