当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2019.2953810
Szu-Wei Fu , Chien-Feng Liao , Yu Tsao

Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently. The main reason is that the conventional mean squared error (MSE) loss cannot represent auditory perception well. One of the typical human-perception-related metrics, which is the perceptual evaluation of speech quality (PESQ), has been proven to provide a high correlation to the quality scores rated by humans. Owing to its complex and non-differentiable properties, however, the PESQ function may not be used to optimize speech enhancement models directly. In this study, we propose optimizing the enhancement model with an approximated PESQ function, which is differentiable and learned from the training data. The experimental results show that the learned surrogate function can guide the enhancement model to further boost the PESQ score (increase of 0.18 points compared to the results trained with MSE loss) and maintain the speech intelligibility.

中文翻译:

使用学习损失函数学习:使用 Quality-Net 进行语音增强以改善语音质量的感知评估

利用与人类感知相关的目标函数来训练语音增强模型已成为最近的热门话题。主要原因是传统的均方误差(MSE)损失不能很好地代表听觉感知。一种典型的人类感知相关指标,即语音质量感知评估 (PESQ),已被证明与人类评定的质量分数具有高度相关性。然而,由于其复杂且不可微的特性,PESQ 函数可能无法直接用于优化语音增强模型。在这项研究中,我们建议使用近似的 PESQ 函数优化增强模型,该函数可微分并从训练数据中学习。
更新日期:2020-01-01
down
wechat
bug