当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An adaptive a priori SNR estimator for perceptual speech enhancement
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2019-06-07 , DOI: 10.1186/s13636-019-0150-3
Lara Nahma , Pei Chee Yong , Hai Huyen Dam , Sven Nordholm

In this paper, an adaptive averaging a priori SNR estimation employing critical band processing is proposed. The proposed method modifies the current decision-directed a priori SNR estimation to achieve faster tracking when SNR changes. The decision-directed estimator (DD) employs a fixed weighting with the value close to one, which makes it slow in following the onsets of speech utterances. The proposed SNR estimator provides a means to solve this issue by employing an adaptive weighting factor. This allows an improved tracking of onset changes in the speech signal. As a consequence, it results in better preservation of speech components. This adaptive technique ensures that the weighting between the modified decision-directed a priori estimate and the maximum likelihood a priori estimate is a function of the speech absence probability. The estimate of the speech absence probability is modeled by a sigmoid function. Furthermore, a critical band mapping for the short-time Fourier transform analysis-synthesis system is utilized in the speech enhancement to achieve less musical noise. In addition, to evaluate the ability of the a priori SNR estimation method in preserving speech components, we proposed a modified objective measurement known as modified hamming distance. Evaluations are performed by utilizing both objective and subjective measurements. The experimental results show that the proposed method improves the speech quality under different noise conditions. Moreover, it maintains the advantage of the DD approach in eliminating the musical noise under different SNR conditions. The objective results are supported by subjective listening tests using 10 subjects (5 males and 5 females).

中文翻译:

一种用于感知语音增强的自适应先验信噪比估计器

在本文中,提出了一种采用临界频带处理的自适应平均先验信噪比估计。所提出的方法修改了当前决策导向的先验 SNR 估计,以在 SNR 变化时实现更快的跟踪。决策导向估计器 (DD) 采用固定权重,其值接近于 1,这使得它在跟随语音发声时变慢。所提出的 SNR 估计器提供了一种通过采用自适应加权因子来解决此问题的方法。这允许改进对语音信号中起始变化的跟踪。因此,它可以更好地保留语音成分。这种自适应技术确保修改后的决策导向先验估计与最大似然先验估计之间的权重是语音缺失概率的函数。语音缺失概率的估计由 sigmoid 函数建模。此外,在语音增强中利用了短时傅立叶变换分析合成系统的临界频带映射,以实现更少的音乐噪声。此外,为了评估先验 SNR 估计方法在保留语音分量方面的能力,我们提出了一种改进的客观测量,称为改进的汉明距离。评估是通过使用客观和主观测量来进行的。实验结果表明,所提出的方法在不同噪声条件下提高了语音质量。此外,它保持了DD方法在不同SNR条件下消除音乐噪声的优势。
更新日期:2019-06-07
down
wechat
bug