Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration,EURASIP Journal on Advances in Signal Processing

当前位置： X-MOL 学术 › EURASIP J. Adv. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration
EURASIP Journal on Advances in Signal Processing ( IF 1.7 ) Pub Date : 2020-12-10 , DOI: 10.1186/s13634-020-00707-1
Maximilian Strake , Bruno Defraene , Kristoff Fluyt , Wouter Tirry , Tim Fingscheidt

Single-channel speech enhancement in highly non-stationary noise conditions is a very challenging task, especially when interfering speech is included in the noise. Deep learning-based approaches have notably improved the performance of speech enhancement algorithms under such conditions, but still introduce speech distortions if strong noise suppression shall be achieved. We propose to address this problem by using a two-stage approach, first performing noise suppression and subsequently restoring natural sounding speech, using specifically chosen neural network topologies and loss functions for each task. A mask-based long short-term memory (LSTM) network is employed for noise suppression and speech restoration is performed via spectral mapping with a convolutional encoder-decoder network (CED). The proposed method improves speech quality (PESQ) over state-of-the-art single-stage methods by about 0.1 points for unseen highly non-stationary noise types including interfering speech. Furthermore, it is able to increase intelligibility in low-SNR conditions and consistently outperforms all reference methods.

中文翻译：

通过基于LSTM的噪声抑制和基于CNN的语音恢复进行语音增强

在高度不稳定的噪声条件下，单通道语音增强是一项非常具有挑战性的任务，尤其是当干扰语音包括在噪声中时。在这种情况下，基于深度学习的方法显着提高了语音增强算法的性能，但如果要实现强大的噪声抑制，仍会引入语音失真。我们建议通过使用两阶段方法来解决此问题，首先使用特定选择的神经网络拓扑和损失函数为每个任务执行噪声抑制，然后恢复自然的语音。基于掩码的长短期存储器（LSTM）网络用于噪声抑制，并且通过卷积编码器/解码器网络（CED）的频谱映射来执行语音恢复。对于看不见的高度非平稳噪声类型（包括干扰语音），该方法相对于最新的单阶段方法，将语音质量（PESQ）提高了约0.1个点。此外，它能够提高低信噪比条件下的清晰度，并且始终优于所有参考方法。

更新日期：2020-12-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11