An attention Long Short-Term Memory based system for automatic classification of speech intelligibility,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An attention Long Short-Term Memory based system for automatic classification of speech intelligibility
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2020-09-23 , DOI: 10.1016/j.engappai.2020.103976
Miguel Fernández-Díaz , Ascensión Gallardo-Antolín

Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.

中文翻译：

基于注意力长期记忆的系统，用于语音清晰度的自动分类

语音清晰度可能由于多种因素而降低，例如嘈杂的环境，技术难题或生物学条件。这项工作专注于开发一种自动非侵入式系统，用于预测在后一种情况下的语音清晰度。我们对此主题的研究的主要贡献是使用带有log-mel频谱图的长短期记忆（LSTM）网络作为输入功能。此外，该基于LSTM的系统通过合并一个简单的注意机制而得以进一步增强，该机制能够确定与此任务更为相关的框架。UA-Speech数据库对提出的模型进行了评估，该数据库包含不同严重程度的发音异常语音。结果表明，关注度LSTM架构胜过两者，

更新日期：2020-09-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>