当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2021-08-14 , DOI: 10.1186/s13636-021-00220-9
Sujan Kumar Roy 1 , Kuldip K. Paliwal 1
Affiliation  

The minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than the speech signal— which lacks the ability to track the highly non-stationary noise sources. Moreover, the performance of the MMSE-based noise PSD estimator largely depends upon the accuracy of the a priori SNR estimation in practice. In this paper, we introduce a noise PSD estimation algorithm using a derivative-based high-pass filter in non-stationary noise conditions. The proposed method processes the silent and speech frames of the noisy speech differently to estimate the noise PSD. It is due to the non-stationary noise that can be mixed with silent and speech-dominated frames non-uniformly. We first introduce a spectral-flatness-based adaptive thresholding technique to detect the speech activity of the noisy speech frames. Since the silent frame of the noisy speech is completely filled with noise, the noise periodogram is directly computed from it without applying any filtering. Conversely, a 4th order derivative-based high-pass filter is applied during speech activity of the noisy speech frame to filter out the clean speech components while leaving behind mostly the noise. The noise periodogram is computed from the filtered signal—which counteracts the leaking of clean speech power. The noise PSD estimate is obtained by recursively averaging the previously estimated noise PSD and the current estimate of the noise periodogram. The proposed method is found to be effective in tracking the rapidly changing as well as the slowly varying noise PSD than the competing methods in non-stationary noise conditions for a wide range of signal-to-noise ratio (SNR) levels. Extensive objective and subjective scores on the NOIZEUS corpus demonstrate that the application of the proposed noise PSD with MMSE-based speech enhancement methods produce higher quality and intelligible enhanced speech than the competing methods.

中文翻译:

在非平稳噪声条件下使用基于导数的高通滤波器的噪声PSD估计算法

基于最小均方误差 (MMSE) 的噪声 PSD 估计器已广泛用于语音增强。然而,MMSE 噪声 PSD 估计器假设噪声信号的变化速率比语音信号慢——语音信号缺乏跟踪高度非平稳噪声源的能力。此外,基于 MMSE 的噪声 PSD 估计器的性能在很大程度上取决于实践中先验 SNR 估计的准确性。在本文中,我们介绍了一种在非平稳噪声条件下使用基于导数的高通滤波器的噪声 PSD 估计算法。所提出的方法以不同方式处理含噪语音的无声和语音帧以估计噪声 PSD。这是由于非平稳噪声可以不均匀地与无声和以语音为主的帧混合。我们首先介绍了一种基于频谱平坦度的自适应阈值技术来检测噪声语音帧的语音活动。由于嘈杂语音的无声帧完全充满了噪声,因此不应用任何过滤直接从中计算噪声周期图。相反,在含噪语音帧的语音活动期间应用基于四阶导数的高通滤波器以滤除干净的语音分量,同时留下大部分噪声。噪声周期图是从过滤后的信号中计算出来的——它可以抵消干净语音功率的泄漏。噪声 PSD 估计是通过递归平均先前估计的噪声 PSD 和噪声周期图的当前估计来获得的。与非平稳噪声条件下的竞争方法相比,对于广泛的信噪比 (SNR) 水平,所提出的方法在跟踪快速变化和缓慢变化的噪声 PSD 方面是有效的。NOIZEUS 语料库上广泛的客观和主观得分表明,与竞争方法相比,将所提出的噪声 PSD 与基于 MMSE 的语音增强方法相结合,可以产生更高质量和可理解的增强语音。
更新日期:2021-08-15
down
wechat
bug