Theoretical learning guarantees applied to acoustic modeling,Journal of the Brazilian Computer Society

当前位置： X-MOL 学术 › J. Braz. Comput. Soc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Theoretical learning guarantees applied to acoustic modeling
Journal of the Brazilian Computer Society Pub Date : 2019-01-04 , DOI: 10.1186/s13173-018-0081-3
Christopher D. Shulby , Martha D. Ferreira , Rodrigo F. de Mello , Sandra M. Aluisio

In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents a shallow and hybrid approach using a convolutional neural network feature extractor fed into a hierarchical tree of support vector machines for classification. Here, we show that gross errors present even in state-of-the-art systems can be avoided and that an accurate acoustic model can be built in a hierarchical fashion. Furthermore, we present proof that our algorithm does adhere to the learning guarantees provided by the statistical learning theory. The acoustic model produced in this work outperforms traditional hidden Markov models, and the hierarchical support vector machine tree outperforms a multi-class multilayer perceptron classifier using the same features. More importantly, we isolate the performance of the acoustic model and provide results on both the frame and phoneme level, considering the true robustness of the model. We show that even with a small amount of data, accurate and robust recognition rates can be obtained.

中文翻译：

应用于声学建模的理论学习保证

例如，在低资源场景中，小数据集或缺乏可用的计算资源，已知最先进的语音识别深度学习方法会失败。如果注意确保统计学习理论提供的学习保证，则有可能实现更稳健的模型。这项工作提出了一种浅层混合方法，使用卷积神经网络特征提取器输入支持向量机的分层树进行分类。在这里，我们表明即使在最先进的系统中也可以避免严重错误，并且可以以分层方式构建准确的声学模型。此外，我们证明了我们的算法确实遵守统计学习理论提供的学习保证。在这项工作中产生的声学模型优于传统的隐马尔可夫模型，层次支持向量机树优于使用相同特征的多类多层感知器分类器。更重要的是，考虑到模型的真实稳健性，我们隔离了声学模型的性能并提供了框架和音素级别的结果。我们表明，即使使用少量数据，也可以获得准确和稳健的识别率。

更新日期：2019-01-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>