当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Single-channel speech enhancement with correlated spectral components: Limits-potential
Speech Communication ( IF 2.4 ) Pub Date : 2020-05-16 , DOI: 10.1016/j.specom.2020.05.002
Pejman Mowlaee , Johannes K.W. Stahl

In this paper, we investigate single-channel speech enhancement algorithms that operate in the short-time Fourier transform and take into account dependencies w.r.t. frequency. As a result of allowing for inter-frequency dependencies, the minimum mean square error optimal estimates of the short-time Fourier transform expansion coefficients are functions of complex-valued covariance matrices in general. The covariance matrices are not known a priori and have to be estimated from the observed data. This work is dedicated to analyzing how this affects the respective single-channel speech enhancement algorithms. We propose a statistical model that circumvents the need to estimate complex-valued second order statistics and derive a linear multidimensional short-time spectral amplitude estimator that is motivated by these assumptions. Further, we provide empirical evidence for the assumptions that form the basis of this model. We evaluate the potential of taking into account inter-frequency dependencies for single-channel speech enhancement and subsequently compare the estimator resulting from the proposed statistical model to relevant benchmark methods. The results indicate that estimators that consider inter-frequency dependencies are capable of pushing the limits of standard approaches in terms of joint speech quality and intelligibility improvement when the second order statistics are estimated from isolated speech data. The proposed linear multidimensional short-time spectral amplitude estimator preserves this trend in fully blind scenarios.



中文翻译:

具有相关频谱成分的单通道语音增强:极限电位

在本文中,我们研究了在短时傅立叶变换中运行的单通道语音增强算法,并考虑了对频率的依赖性。由于允许频率间相关性,因此,短时傅立叶变换扩展系数的最小均方误差最佳估计值通常是复值协方差矩阵的函数。先验未知协方差矩阵,必须根据观察到的数据进行估计。这项工作致力于分析这如何影响相应的单通道语音增强算法。我们提出了一种统计模型,该模型无需估计复杂值的二阶统计量,并且可以得出由这些假设所激发的线性多维短时频谱幅度估计器。进一步,我们为构成该模型基础的假设提供经验证据。我们评估考虑到单频语音增强的频率间相关性的潜力,然后将所提出的统计模型得出的估计量与相关基准方法进行比较。结果表明,当从孤立的语音数据中估计二阶统计量时,考虑到频率间依赖性的估计器能够提高标准方法在联合语音质量和清晰度方面的极限。所提出的线性多维短时频谱幅度估计器在完全盲的情况下保留了这种趋势。我们评估考虑到单频语音增强的频率间相关性的潜力,然后将所提出的统计模型得出的估计量与相关基准方法进行比较。结果表明,当从孤立的语音数据中估计二阶统计量时,考虑到频率间依赖性的估计器能够提高标准方法在联合语音质量和清晰度改善方面的局限性。所提出的线性多维短时频谱幅度估计器在完全盲的情况下保留了这种趋势。我们评估考虑到单频语音增强的频率间相关性的潜力,然后将所提出的统计模型得出的估计量与相关基准方法进行比较。结果表明,当从孤立的语音数据中估计二阶统计量时,考虑到频率间依赖性的估计器能够提高标准方法在联合语音质量和清晰度改善方面的局限性。所提出的线性多维短时频谱幅度估计器在完全盲的情况下保留了这种趋势。结果表明,当从孤立的语音数据中估计二阶统计量时,考虑到频率间依赖性的估计器能够提高标准方法在联合语音质量和清晰度改善方面的局限性。所提出的线性多维短时频谱幅度估计器在完全盲的情况下保留了这种趋势。结果表明,当从孤立的语音数据中估计二阶统计量时,考虑到频率间依赖性的估计器能够提高标准方法在联合语音质量和清晰度改善方面的局限性。所提出的线性多维短时频谱幅度估计器在完全盲的情况下保留了这种趋势。

更新日期:2020-05-16
down
wechat
bug