Instantaneous PSD Estimation for Speech Enhancement based on Generalized Principal Components,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Instantaneous PSD Estimation for Speech Enhancement based on Generalized Principal Components
arXiv - CS - Sound Pub Date : 2020-07-01 , DOI: arxiv-2007.00542
Thomas Dietzen, Marc Moonen, Toon van Waterschoot

Power spectral density (PSD) estimates of various microphone signal components are essential to many speech enhancement procedures. As speech is highly non-nonstationary, performance improvements may be gained by maintaining time-variations in PSD estimates. In this paper, we propose an instantaneous PSD estimation approach based on generalized principal components. Similarly to other eigenspace-based PSD estimation approaches, we rely on recursive averaging in order to obtain a microphone signal correlation matrix estimate to be decomposed. However, instead of estimating the PSDs directly from the temporally smooth generalized eigenvalues of this matrix, yielding temporally smooth PSD estimates, we propose to estimate the PSDs from newly defined instantaneous generalized eigenvalues, yielding instantaneous PSD estimates. The instantaneous generalized eigenvalues are defined from the generalized principal components, i.e. a generalized eigenvector-based transform of the microphone signals. We further show that the smooth generalized eigenvalues can be understood as a recursive average of the instantaneous generalized eigenvalues. Simulation results comparing the multi-channel Wiener filter (MWF) with smooth and instantaneous PSD estimates indicate better speech enhancement performance for the latter. A MATLAB implementation is available online.

中文翻译：

基于广义主成分的语音增强瞬时PSD估计

各种麦克风信号分量的功率谱密度 (PSD) 估计对于许多语音增强程序至关重要。由于语音是高度非平稳的，因此可以通过保持 PSD 估计中的时间变化来提高性能。在本文中，我们提出了一种基于广义主成分的瞬时 PSD 估计方法。与其他基于特征空间的 PSD 估计方法类似，我们依靠递归平均来获得要分解的麦克风信号相关矩阵估计。然而，不是直接从该矩阵的时间平滑广义特征值估计 PSD，产生时间平滑 PSD 估计，我们建议从新定义的瞬时广义特征值估计 PSD，产生瞬时 PSD 估计。瞬时广义特征值由广义主成分定义，即麦克风信号的基于广义特征向量的变换。我们进一步表明，平滑广义特征值可以理解为瞬时广义特征值的递归平均。将多通道维纳滤波器 (MWF) 与平滑和瞬时 PSD 估计进行比较的仿真结果表明后者具有更好的语音增强性能。MATLAB 实现可在线获得。将多通道维纳滤波器 (MWF) 与平滑和瞬时 PSD 估计进行比较的仿真结果表明后者具有更好的语音增强性能。MATLAB 实现可在线获得。将多通道维纳滤波器 (MWF) 与平滑和瞬时 PSD 估计进行比较的仿真结果表明后者具有更好的语音增强性能。MATLAB 实现可在线获得。

更新日期：2020-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文