当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Neural network-based non-intrusive speech quality assessment using attention pooling function
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2021-05-17 , DOI: 10.1186/s13636-021-00209-4
Miao Liu , Jing Wang , Weiming Yi , Fang Liu

Recently, the non-intrusive speech quality assessment method has attracted a lot of attention since it does not require the original reference signals. At the same time, neural networks began to be applied to speech quality assessment and achieved good performance. To improve the performance of non-intrusive speech quality assessment, this paper proposes a neural network-based assessment method using attention pooling function. The proposed systems are based on the convolutional neural networks (CNNs), bidirectional long short-term memory (BLSTM), and CNN-LSTM structure. Comparing four types of pooling functions both theoretically and experimentally, we find the attention pooling function performs the best among the four. Experiments are conducted in a dataset containing various degraded speech signals with corresponding subjective quality scores. The results show that the proposed CNN-LSTM model using attention pooling function achieves state-of-the-art correlation coefficient (R) and root-mean-square error (RMSE) of 0.967 and 0.269, outperforming the performance of standardization ITU-T P.563 and autoencoder-support vector regression method.

中文翻译:

使用注意力集中功能的基于神经网络的非侵入式语音质量评估

近来,非介入语音质量评估方法由于不需要原始参考信号而引起了很多关注。同时,神经网络开始应用于语音质量评估,并取得了良好的性能。为了提高非介入式语音质量评估的性能,提出了一种基于注意力池功能的基于神经网络的评估方法。提出的系统基于卷积神经网络(CNN),双向长短期记忆(BLSTM)和CNN-LSTM结构。从理论上和实验上比较四种类型的合并功能,我们发现注意力合并功能在这四种方法中表现最好。在包含具有相应主观质量得分的各种降级语音信号的数据集中进行实验。
更新日期:2021-05-17
down
wechat
bug