Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2021-04-12 , DOI: 10.1186/s13636-021-00204-9
Yuxuan Ke , Andong Li , Chengshi Zheng , Renhua Peng , Xiaodong Li

Deep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.

中文翻译：

基于深度学习的语音增强算法的低复杂度人工噪声抑制方法

基于深度学习的语音增强算法已显示出强大的功能，可以从嘈杂的语音观察中去除固定和非固定噪声分量。但是它们通常会引入人为的残留噪声，特别是在训练目标不包含相位信息（例如理想比率蒙版或清晰语音幅度及其变化）的情况下。众所周知，一旦残余噪声成分的功率超过人类听觉系统的噪声掩蔽阈值，感知语音质量就会下降。一种直观的方法是通过后处理方案进一步抑制残留噪声分量。但是，这种残留噪声的高度非平稳性使噪声功率谱密度（PSD）估计成为一个难题。为了解决这个问题，提出了三种逐帧估计噪声PSD的策略，然后基于决策导向的方法应用增益函数可以有效地去除残留噪声。客观的测量结果表明，提出的后置滤波策略在分段信噪比（SNR）以及语音质量改善方面均优于常规后置滤波器。此外，AB主观听力测试表明，所提出策略的偏好百分比超过60％。客观的测量结果表明，提出的后置滤波策略在分段信噪比（SNR）以及语音质量改善方面均优于常规后置滤波器。此外，AB主观听力测试表明，所提出策略的偏好百分比超过60％。客观的测量结果表明，提出的后置滤波策略在分段信噪比（SNR）以及语音质量改善方面均优于常规后置滤波器。此外，AB主观听力测试表明，所提出策略的偏好百分比超过60％。

更新日期：2021-04-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文