当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dense CNN with Self-Attention for Time-Domain Speech Enhancement
arXiv - CS - Sound Pub Date : 2020-09-03 , DOI: arxiv-2009.01941
Ashutosh Pandey and DeLiang Wang

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.

中文翻译:

具有自注意力的密集 CNN 用于时域语音增强

时域中的语音增强近年来变得越来越流行,因为它能够共同增强语音的幅度和相位。在这项工作中,我们提出了一个密集的卷积网络 (DCN),具有自我注意的时域语音增强功能。DCN 是一种基于编码器和解码器的架构,具有跳过连接。编码器和解码器中的每一层都包含一个密集块和一个注意力模块。密集块和注意力模块结合使用特征重用、增加网络深度和最大上下文聚合来帮助特征提取。此外,我们根据增强语音的频谱幅度揭示了以前未知的损失问题。为了缓解这些问题,我们提出了一种基于增强语音幅度和预测噪声的新损失。即使提议的损失仅基于幅度,噪声预测施加的约束也确保损失增强幅度和相位。实验结果表明,使用建议的损失训练的 DCN 大大优于其他最先进的因果和非因果语音增强方法。
更新日期:2020-09-07
down
wechat
bug