当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FLGCNN: A novel fully convolutional neural network for end-to-end monaural speech enhancement with utterance-based objective functions
Applied Acoustics ( IF 3.4 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.apacoust.2020.107511
Yuanyuan Zhu , Xu Xu , Zhongfu Ye

Abstract This paper proposes a novel fully convolutional neural network (FCN) called FLGCNN to address the end-to-end speech enhancement in time domain. The proposed FLGCNN is mainly built on encoder and decoder, while the extra convolutional-based short-time Fourier transform (CSTFT) layer and inverse STFT (CISTFT) layer are added to emulate the forward and inverse STFT operations. These layers aim to integrate the frequency-domain knowledge into the proposed model since the underlying phonetic information of speech is presented more clearly by time–frequency (T-F) representations. In addition, the encoder and decoder are constructed by the gated convolutional layers so that the proposed model can better control the information passed on in the hierarchy. Besides, motivated by the popular temporal convolutional neural network (TCNN), the temporal convolutional module (TCM) which is efficient in modeling the long-term dependencies of speech signal is inserted between encoder and decoder. We also optimize the proposed model with different utterance-based objective functions to exploit the impact of loss functions on performance, because the entire framework can realize the end-to-end speech enhancement. Experimental results have demonstrated that the proposed model consistently gives better performance improvement than the other competitive methods of speech enhancement.

中文翻译:

FLGCNN:一种新颖的全卷积神经网络,用于具有基于话语的目标函数的端到端单声道语音增强

摘要 本文提出了一种称为 FLGCNN 的新型全卷积神经网络 (FCN),以解决时域中的端到端语音增强问题。所提出的 FLGCNN 主要建立在编码器和解码器上,同时添加了额外的基于卷积的短时傅立叶变换 (CSTFT) 层和逆 STFT (CISTFT) 层来模拟前向和逆向 STFT 操作。这些层旨在将频域知识集成到所提出的模型中,因为语音的底层语音信息通过时频 (TF) 表示更清晰地呈现。此外,编码器和解码器由门控卷积层构建,因此所提出的模型可以更好地控制在层次结构中传递的信息。此外,在流行的时间卷积神经网络(TCNN)的推动下,在编码器和解码器之间插入在建模语音信号的长期依赖性方面有效的时间卷积模块(TCM)。我们还使用不同的基于话语的目标函数来优化所提出的模型,以利用损失函数对性能的影响,因为整个框架可以实现端到端的语音增强。实验结果表明,所提出的模型始终比其他竞争性语音增强方法提供更好的性能改进。因为整个框架可以实现端到端的语音增强。实验结果表明,所提出的模型始终比其他竞争性语音增强方法提供更好的性能改进。因为整个框架可以实现端到端的语音增强。实验结果表明,所提出的模型始终比其他竞争性语音增强方法提供更好的性能改进。
更新日期:2020-12-01
down
wechat
bug