当前位置: X-MOL 学术IEEE J. Sel. Top. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Multi-Scale Feature Recalibration Network for End-to-End Single Channel Speech Enhancement
IEEE Journal of Selected Topics in Signal Processing ( IF 8.7 ) Pub Date : 2020-12-18 , DOI: 10.1109/jstsp.2020.3045846
Yang Xian , Yang Sun , Wenwu Wang , Syed Mohsen Naqvi

Deep neural networks based methods dominate recent development in single channel speech enhancement. In this paper, we propose a multi-scale feature recalibration convolutional encoder-decoder with bidirectional gated recurrent unit (BGRU) architecture for end-to-end speech enhancement. More specifically, multi-scale recalibration 2-D convolutional layers are used to extract local and contextual features from the signal. In addition, a gating mechanism is used in the recalibration network to control the information flow among the layers, which enables the scaled features to be weighted in order to retain speech and suppress noise. The fully connected layer (FC) is then employed to compress the output of the multi-scale 2-D convolutional layer with a small number of neurons, thus capturing the global information and improving parameter efficiency. The BGRU layers employ forward and backward GRUs, which contain the reset, update, and output gates, to exploit the interdependency among the past, current and future frames to improve predictions. The experimental results confirm that the proposed MCGN method outperforms several state-of-the-art methods.

中文翻译:

用于端到端单通道语音增强的多尺度特征重新校准网络

基于深度神经网络的方法主导着单通道语音增强的最新发展。在本文中,我们提出了一种具有双向门控递归单元(BGRU)架构的多尺度特征重新校准卷积编解码器,用于端到端语音增强。更具体地说,多尺度重新校准2-D卷积层用于从信号中提取局部和上下文特征。另外,在重新校准网络中使用选通机制来控制各层之间的信息流,这使缩放后的特征得以加权以保留语音并抑制噪声。然后,使用全连接层(FC)压缩带有少量神经元的多尺度2-D卷积层的输出,从而捕获全局信息并提高参数效率。BGRU层采用了包含重置,更新和输出门的前向和后向GRU,以利用过去,当前和未来帧之间的相互依赖性来改善预测。实验结果证实了所提出的MCGN方法优于几种最新方法。
更新日期:2021-02-09
down
wechat
bug