A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2021-06-30 , DOI: 10.1109/lsp.2021.3093859
Xiaoxiao Xiang , Xiaojuan Zhang , Haozhe Chen

One of the leading speech enhancement technologies is the deep neural network-based approach, which dominates the recent development in single-channel speech enhancement. In this paper, we propose a convolutional network with multi-scale and attention mechanisms for end-to-end single-channel speech enhancement (MASENet). More specifically, the MASENet network consists of five modules, namely multi-scale speech encoder, frequency-dilated module, temporal convolutional attention module, post-processing module, and single-scale speech decoder. The frequency-dilated module and temporal convolutional attention module are leveraged to extract local and global information. The dense connections are used to avoid the vanishing gradient problem. Furthermore, we design the attention block to improve the discriminative learning ability of the network. The experimental results show that the proposed network achieves significantly better enhancement performance than other baselines in terms of objective speech intelligibility and quality metrics.

中文翻译：

具有多尺度和注意力机制的卷积网络，用于端到端单通道语音增强

领先的语音增强技术之一是基于深度神经网络的方法，它主导了单通道语音增强的最新发展。在本文中，我们提出了一种具有多尺度和注意力机制的卷积网络，用于端到端单通道语音增强（MASENet）。更具体地说，MASENet网络由五个模块组成，即多尺度语音编码器、频率扩张模块、时间卷积注意模块、后处理模块和单尺度语音解码器。利用频率扩张模块和时间卷积注意模块来提取局部和全局信息。使用密集连接来避免梯度消失问题。此外，我们设计了注意力块来提高网络的判别学习能力。实验结果表明，在客观语音清晰度和质量指标方面，所提出的网络比其他基线实现了显着更好的增强性能。

更新日期：2021-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11