当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An attention Long Short-Term Memory based system for automatic classification of speech intelligibility
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2020-09-23 , DOI: 10.1016/j.engappai.2020.103976
Miguel Fernández-Díaz , Ascensión Gallardo-Antolín

Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.



中文翻译:

基于注意力长期记忆的系统,用于语音清晰度的自动分类

语音清晰度可能由于多种因素而降低,例如嘈杂的环境,技术难题或生物学条件。这项工作专注于开发一种自动非侵入式系统,用于预测在后一种情况下的语音清晰度。我们对此主题的研究的主要贡献是使用带有log-mel频谱图的长短期记忆(LSTM)网络作为输入功能。此外,该基于LSTM的系统通过合并一个简单的注意机制而得以进一步增强,该机制能够确定与此任务更为相关的框架。UA-Speech数据库对提出的模型进行了评估,该数据库包含不同严重程度的发音异常语音。结果表明,关注度LSTM架构胜过两者,

更新日期:2020-09-23
down
wechat
bug