Speech enhancement methods based on binaural cue coding,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Speech enhancement methods based on binaural cue coding
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2019-12-01 , DOI: 10.1186/s13636-019-0164-x
Xianyun Wang , Changchun Bao

According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively. Subsequently, the speech signal is estimated from noisy speech when the inter-channel level difference (ICLD) and inter-channel correlation (ICC) between speech and noise are given. In this paper, exact inter-channel cues and the pre-enhanced inter-channel cues are used for speech restoration. The exact inter-channel cues are extracted from clean speech and noise, and the pre-enhanced inter-channel cues are extracted from the pre-enhanced speech and estimated noise. After that, they are combined one by one to form a codebook. Once the pre-enhanced cues are extracted from noisy speech, the exact cues are estimated by a mapping between the pre-enhanced cues and a prior codebook. Next, the estimated exact cues are used to obtain a time-frequency (T-F) mask for enhancing noisy speech based on the decoding of BCC. In addition, in order to further improve accuracy of the T-F mask based on the inter-channel cues, the deep neural network (DNN)-based method is proposed to learn the mapping relationship between input features of noisy speech and the T-F masks. Experimental results show that the codebook-driven method can achieve better performance than conventional methods, and the DNN-based method performs better than the codebook-driven method.

中文翻译：

基于双耳线索编码的语音增强方法

根据双耳线索编码（BCC）的编解码机制，本文将语音和噪声分别视为BCC框架的左声道信号和右声道信号。随后，当给出语音和噪声之间的通道间电平差 (ICLD) 和通道间相关性 (ICC) 时，从带噪语音中估计语音信号。在本文中，精确的通道间线索和预增强的通道间线索用于语音恢复。从干净的语音和噪声中提取精确的通道间线索，从预增强的语音和估计的噪声中提取预增强的通道间线索。之后，将它们一一组合，形成一个码本。一旦从嘈杂的语音中提取了预先增强的线索，精确的线索是通过预先增强的线索和先前的码本之间的映射来估计的。接下来，估计的精确线索用于获得时频 (TF) 掩码，用于基于 BCC 的解码增强含噪语音。此外，为了进一步提高基于通道间线索的TF掩码的准确性，提出了基于深度神经网络（DNN）的方法来学习嘈杂语音的输入特征与TF掩码之间的映射关系。实验结果表明，码本驱动的方法可以取得比传统方法更好的性能，基于DNN的方法比码本驱动的方法性能更好。为了进一步提高基于通道间线索的TF掩码的准确性，提出了基于深度神经网络（DNN）的方法来学习嘈杂语音的输入特征与TF掩码之间的映射关系。实验结果表明，码本驱动的方法可以取得比传统方法更好的性能，基于DNN的方法比码本驱动的方法性能更好。为了进一步提高基于通道间线索的TF掩码的准确性，提出了基于深度神经网络（DNN）的方法来学习嘈杂语音的输入特征与TF掩码之间的映射关系。实验结果表明，码本驱动的方法可以取得比传统方法更好的性能，基于DNN的方法比码本驱动的方法性能更好。

更新日期：2019-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文