DNN-Based Calibrated-Filter Models for Speech Enhancement,Circuits, Systems, and Signal Processing

当前位置： X-MOL 学术 › Circuits Syst. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DNN-Based Calibrated-Filter Models for Speech Enhancement
Circuits, Systems, and Signal Processing ( IF 2.3 ) Pub Date : 2021-01-27 , DOI: 10.1007/s00034-020-01604-6
Yazid Attabi , Benoit Champagne , Wei-Ping Zhu

In this paper, we present a new two-stage speech enhancement approach, specially conceived to reduce musical and other random noises without requiring their localization in the time–frequency domain. The proposed method is motivated by two observations: (1) the random scattering nature of the energy peaks corresponding to the musical noise in the spectrogram of the processed speech; and (2) the existence of correlation between Wiener filter gains calculated at different frequencies. In the first stage of the proposed method, a preliminary gain function is generated using the nonnegative matrix factorization algorithm. In the second stage, a modified gain function that is more robust to noise artefacts, and referred to as calibrated filter, is estimated by applying a DNN-based nonlinear mapping function to the preliminary gain function. To further decrease the variability of the estimated calibrated filter, we propose to expand the DNN-based extraction of frequency dependencies to a set of preliminary gain functions derived from spectral estimates based on a family of data tapers; the resulting calibrated filter is referred to as multi-filter. The evaluation of the proposed DNN-based calibrated filter models for speech enhancement, under different noise types and input SNR levels, shows substantial improvements in terms of standard speech quality and intelligibility measures when compared to uncalibrated filter.

中文翻译：

基于DNN的语音增强校准滤波器模型

在本文中，我们提出了一种新的两阶段语音增强方法，该方法专门用于减少音乐噪声和其他随机噪声，而无需将它们定位在时频域中。所提出的方法是基于两个观察结果：（1）与所处理语音的声谱图中的音乐噪声相对应的能量峰的随机散射特性；（2）在不同频率下计算出的维纳滤波器增益之间存在相关性。在提出的方法的第一阶段，使用非负矩阵分解算法生成初步增益函数。在第二阶段，修改后的增益函数对噪声伪像更鲁棒，称为校准滤波器通过将基于DNN的非线性映射函数应用于初步增益函数来估算。为了进一步降低估计的校准滤波器的可变性，我们建议将基于DNN的频率依赖性提取扩展到从基于一系列数据锥度的频谱估计派生的一组初步增益函数；所得的经过校准的滤波器称为多重滤波器。与未校准的滤波器相比，在不同的噪声类型和输入SNR级别下，对建议的基于DNN的用于语音增强的校准滤波器模型的评估显示出在标准语音质量和清晰度方面的显着改进。

更新日期：2021-01-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>