Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2017-04-20 , DOI: 10.1109/taslp.2017.2696307
Donald S Williamson ₁ , DeLiang Wang ₂

Affiliation

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

中文翻译：

用于语音去混响和降噪的复域中的时频掩蔽。

在现实世界中，语音会被背景噪声和混响所掩盖，这会对感知质量和清晰度产生负面影响。在本文中，我们解决了混响和噪声环境中的单声道语音分离问题。我们使用深度神经网络的监督学习来执行去混响和降噪。具体来说，我们通过估计复杂的理想比率掩模来执行分离来增强幅度和相位。我们定义了复杂的理想比率掩模，以便在将掩模应用于混响和噪声语音后产生直接语音结果。我们的方法是使用模拟和真实的房间脉冲响应以及背景噪声进行评估的。所提出的方法显着提高了客观语音质量和清晰度。评估和比较表明，它在许多混响和噪声环境中优于相关方法。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文