Complex Ratio Masking for Monaural Speech Separation.,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Complex Ratio Masking for Monaural Speech Separation.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2016-04-14 , DOI: 10.1109/taslp.2015.2512042
Donald S Williamson ₁ , Yuxuan Wang ₂ , DeLiang Wang ₃

Affiliation

Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider magnitude and phase spectrum enhancements. We present a supervised monaural speech separation approach that simultaneously enhances the magnitude and phase spectra by operating in the complex domain. Our approach uses a deep neural network to estimate the real and imaginary components of the ideal ratio mask defined in the complex domain. We report separation results for the proposed method and compare them to related systems. The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.

中文翻译：

用于单声道语音分离的复数比率掩蔽。

语音分离系统通常在嘈杂语音的短时傅立叶变换（STFT）上运行，并且仅增强幅度谱而保持相位谱不变。这样做是因为有人认为相位频谱对于语音增强不重要。然而，最近的研究表明，相位对于感知质量很重要，导致一些研究人员考虑幅度和相位频谱的增强。我们提出了一种有监督的单声道语音分离方法，该方法通过在复杂域中进行操作来同时增强幅度和相位谱。我们的方法使用深层神经网络来估计复杂域中定义的理想比率蒙版的实部和虚部。我们报告所提出方法的分离结果，并将其与相关系统进行比较。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文