当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Real-time speech enhancement algorithm for transient noise suppression
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2020-09-23 , DOI: 10.1007/s11042-020-09849-8
Ruiyu Liang , Yue Xie , Jiaming Cheng , Guichen Tang , Shinuo Sun

To effectively restrain stationary noise and transient noise, a real-time single-channel speech enhancement algorithm is proposed. First, to evaluate stationary noise, the quantile noise estimation method is used to obtain the spectrum of stationary noise. Then, based on the normalized variance and gravity center of the signal, the transient noise detection method is proposed to modify the spectrum of stationary noise. Next, the speech presence probability is estimated based on the speech features and harmonic analysis. Finally, the optimized-modified log-spectral amplitude (OM-LSA) estimator is adopted for speech enhancement. The experimental noise contains 115 environmental sounds with the SNR of −10 to 10 dB. The experimental results show that the performance of the proposed algorithm is comparable to the OM-LSA algorithm which has good denoising performance, but the real-time performance of the former is much better. Compared with the Webrtc real-time algorithm, under the overall performance of stationary noise and transient noise, the overall speech quality indicators of the improved algorithm increased by 7.5%, 7.8% and 5.0%, respectively. And the short-time objective intelligibility increased by 2.4%, 2.4% and 2.0%, respectively. Even compared with the recurrent neural network(RNN) algorithm, the suppression performance of the transient noise is better. Besides, the real-time experiment base on the hardware platform shows that the runtime of processing a 10 ms frame is 4.3 ms.



中文翻译:

用于瞬时噪声抑制的实时语音增强算法

为了有效抑制平稳噪声和瞬态噪声,提出了一种实时单通道语音增强算法。首先,为了评估平稳噪声,使用分位数噪声估计方法来获得平稳噪声的频谱。然后,基于信号的归一化方差和重心,提出了一种瞬态噪声检测方法,对平稳噪声频谱进行了修正。接下来,基于语音特征和谐波分析来估计语音存在概率。最后,采用优化后的对数谱振幅(OM-LSA)估计器进行语音增强。实验噪声包含115种环境声音,SNR为-10至10 dB。实验结果表明,该算法的性能与去噪性能良好的OM-LSA算法相当,但前者的实时性更好。与Webrtc实时算法相比,在平稳噪声和瞬态噪声的整体性能下,改进算法的整体语音质量指标分别提高了7.5%,7.8%和5.0%。短期目标清晰度提高了2.4%,2.4%和2.0%。即使与递归神经网络算法相比,瞬态噪声的抑制性能也更好。此外,基于硬件平台的实时实验表明,处理10 ms帧的运行时间为4.3 ms。但是前者的实时性能要好得多。与Webrtc实时算法相比,在平稳噪声和瞬态噪声的整体性能下,改进算法的整体语音质量指标分别提高了7.5%,7.8%和5.0%。短期目标清晰度提高了2.4%,2.4%和2.0%。即使与递归神经网络算法相比,瞬态噪声的抑制性能也更好。此外,基于硬件平台的实时实验表明,处理10 ms帧的运行时间为4.3 ms。但是前者的实时性能要好得多。与Webrtc实时算法相比,在平稳噪声和瞬态噪声的整体性能下,改进算法的整体语音质量指标分别提高了7.5%,7.8%和5.0%。短期目标清晰度提高了2.4%,2.4%和2.0%。即使与递归神经网络算法相比,瞬态噪声的抑制性能也更好。此外,基于硬件平台的实时实验表明,处理10 ms帧的运行时间为4.3 ms。分别。短期目标清晰度提高了2.4%,2.4%和2.0%。即使与递归神经网络算法相比,瞬态噪声的抑制性能也更好。此外,基于硬件平台的实时实验表明,处理10 ms帧的运行时间为4.3 ms。分别。短期目标清晰度提高了2.4%,2.4%和2.0%。即使与递归神经网络算法相比,瞬态噪声的抑制性能也更好。此外,基于硬件平台的实时实验表明,处理10 ms帧的运行时间为4.3 ms。

更新日期:2020-09-24
down
wechat
bug