当前位置: X-MOL 学术Neural Process Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Human Auditory Perception Loss Function Using Modified Bark Spectral Distortion for Speech Enhancement
Neural Processing Letters ( IF 3.1 ) Pub Date : 2020-03-03 , DOI: 10.1007/s11063-020-10212-z
Xiaofeng Shu , Yi Zhou , Hongqing Liu , Trieu-Kien Truong

Human listeners often have difficulties understanding speech in the presence of background noise in daily speech communication environments. Recently, deep neural network (DNN)-based techniques have been successfully applied to speech enhancement and achieved significant improvements over the conventional approaches. However, existing DNN-based methods usually minimize the log-power spectral-based or the masking-based mean squared error (MSE) between the enhanced output and the training target (e.g., the ideal ratio mask (IRM) of the clean speech), which is not closely related to human auditory perception. In this letter, a modified bark spectral distortion loss function, which can be considered as an auditory perception-based MSE, is proposed to replace the conventional MSE in DNN-based speech enhancement approaches to further improve the objective perceptual quality. Experimental results reveal that the proposed method can obtain improved speech enhancement performance, especially in terms of objective perceptual quality in all experimental settings when compared with the DNN-based methods using the conventional MSE criterion.

中文翻译:

使用改进的树皮光谱失真进行语音增强的人类听觉感知损失功能

在日常语音通信环境中,听众经常在背景噪声存在的情况下难以理解语音。近来,基于深度神经网络(DNN)的技术已成功应用于语音增强,并且比常规方法有了显着改进。但是,现有的基于DNN的方法通常会最小化增强输出和训练目标之间基于对数功率谱或基于掩码的均方误差(MSE)(例如,干净语音的理想比率掩码(IRM)) ,这与人类的听觉感知没有密切关系。在这封信中,提出了一种经过改进的树皮频谱失真损失函数,可以将其视为基于听觉的MSE,提出以基于DNN的语音增强方法代替传统的MSE,以进一步提高客观感知质量。实验结果表明,与使用常规MSE准则的基于DNN的方法相比,该方法可以获得更好的语音增强性能,特别是在所有实验设置中的客观感知质量方面。
更新日期:2020-03-03
down
wechat
bug