当前位置: X-MOL 学术 › IEEE Trans Audio Speech Lang Process › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation.
IEEE transactions on audio, speech, and language processing Pub Date : 2009-01-01 , DOI: 10.1109/tasl.2008.2005342
Jiucang Hao 1 , Hagai Attias , Srikantan Nagarajan , Te-Won Lee , Terrence J Sejnowski
Affiliation  

This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the log-spectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computationally intractable problem. We derive three approximations to enhance the efficiency of signal estimation. The Gaussian approximation transforms the log-spectral domain GMM into the frequency domain using minimal Kullback-Leiber (KL)-divergency criterion. The frequency domain Laplace method computes the maximum a posteriori (MAP) estimator for the spectral amplitude. Correspondingly, the log-spectral domain Laplace method computes the MAP estimator for the log-spectral amplitude. Further, the gain and noise spectrum adaptation are implemented using the expectation-maximization (EM) algorithm within the GMM under Gaussian approximation. The proposed algorithms are evaluated by applying them to enhance the speeches corrupted by the speech-shaped noise (SSN). The experimental results demonstrate that the proposed algorithms offer improved signal-to-noise ratio, lower word recognition error rate, and less spectral distortion.

中文翻译:

使用近似贝叶斯估计的语音增强、增益和噪声频谱自适应。

本文提出了一种新的近似贝叶斯估计器,用于增强嘈杂的语音信号。假设语音模型是对数谱域中的高斯混合模型 (GMM)。这与频域中的大多数当前模型形成对比。精确的信号估计是一个计算上棘手的问题。我们推导出三个近似值来提高信号估计的效率。高斯近似使用最小 Kullback-Leiber (KL) 散度准则将对数谱域 GMM 转换为频域。频域拉普拉斯方法计算谱振幅的最大后验 (MAP) 估计量。相应地,对数谱域拉普拉斯方法计算对数谱振幅的 MAP 估计量。更远,增益和噪声频谱自适应是在高斯近似下使用 GMM 中的期望最大化 (EM) 算法实现的。所提出的算法通过应用它们来增强被语音形状噪声(SSN)破坏的语音来评估。实验结果表明,所提出的算法提供了更高的信噪比、更低的单词识别错误率和更少的频谱失真。
更新日期:2019-11-01
down
wechat
bug