当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A two-stage complex network using cycle-consistent generative adversarial networks for speech enhancement
Speech Communication ( IF 2.4 ) Pub Date : 2021-09-13 , DOI: 10.1016/j.specom.2021.09.001
Guochen Yu 1, 2, 3 , Yutian Wang 1, 2 , Hui Wang 1, 2 , Qin Zhang 1, 2 , Chengshi Zheng 3, 4
Affiliation  

Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate throughout the cycle and cannot be completely eliminated. Additionally, conventional CycleGAN-based SE systems only estimate the spectral magnitude, while the phase is unaltered. Motivated by the multi-stage learning concept, we propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper. Specifically, in the first stage, a CycleGAN-based model is responsible for only estimating magnitude, which is subsequently coupled with the original noisy phase to obtain a coarsely enhanced complex spectrum. After that, the second stage is applied to further suppress the residual noise components and estimate the clean phase by a complex spectral mapping network, which is a pure complex-valued network composed of complex 2D convolution/deconvolution and complex temporal-frequency attention blocks. Experimental results on two public datasets demonstrate that the proposed approach consistently surpasses previous one-stage CycleGANs and other state-of-the-art SE systems in terms of various evaluation metrics, especially in background noise suppression.



中文翻译:

使用循环一致生成对抗网络进行语音增强的两阶段复杂网络

循环一致生成对抗网络 (CycleGAN) 已显示出其在语音增强 (SE) 方面的良好性能,而这些基于 CycleGAN 的 SE 系统的一个棘手缺点是噪声分量在整个循环中传播并且无法完全消除。此外,传统的基于 CycleGAN 的 SE 系统仅估计光谱幅度,而相位不变。受多阶段学习概念的启发,我们提出了一种新颖的两阶段去噪系统,该系统结合了基于 CycleGAN 的幅度增强网络和随后的复杂频谱细化网络。具体来说,在第一阶段,基于 CycleGAN 的模型仅负责估计幅度,随后与原始噪声相位耦合以获得粗略增强的复频谱。在那之后,第二阶段用于通过复杂的频谱映射网络进一步抑制残余噪声分量并估计干净的相位,该网络是由复杂的二维卷积/反卷积和复杂的时频注意块组成的纯复数值网络。在两个公共数据集上的实验结果表明,所提出的方法在各种评估指标方面始终优于先前的单阶段 CycleGAN 和其他最先进的 SE 系统,尤其是在背景噪声抑制方面。

更新日期:2021-09-24
down
wechat
bug