Improving generative adversarial networks for speech enhancement through regularization of latent representations,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving generative adversarial networks for speech enhancement through regularization of latent representations
Speech Communication ( IF 2.4 ) Pub Date : 2020-02-06 , DOI: 10.1016/j.specom.2020.02.001
Fan Yang , Ziteng Wang , Junfeng Li , Risheng Xia , Yonghong Yan

Speech enhancement aims to improve the quality and intelligibility of speech signals, which is a challenging task in adverse environments. Speech enhancement generative adversarial network (SEGAN) that adopted a generative adversarial network (GAN) for speech enhancement achieved promising results. In this paper, a new network architecture and loss function based on SEGAN are proposed for speech enhancement. Different from most network structures applied in this field, the new network, called high-level GAN (HLGAN), uses parallel noisy and clean speech signals as input in the training phase instead of only noisy speech signals, which enables us to make full use of the information carried by the clean speech signals. Additionally, we introduce a new supervised speech representation loss, also known as high-level loss, in the middle hidden layer of the generative network. The high-level loss function is advantageous to HLGAN in speech enhancement under low signal-to-noise (SNR) environments and low-resource environments. We evaluate the performance of HLGAN over a wide range of experiments, in which our model produces significant improvements. Extensive experiments further demonstrate the generality of our model in a variety of speech enhancement cases. The issue of SEGAN losing speech components while removing noise in low SNR environments is improved. In addition, HLGAN can effectively enhance the speech signals of two low-resource languages simultaneously. The reasons for the superior performance of HLGAN are discussed.

中文翻译：

通过潜在表示的正则化改进生成对抗网络以增强语音

语音增强旨在提高语音信号的质量和清晰度，这在不利的环境中是一项艰巨的任务。采用生成对抗网络（GAN）进行语音增强的语音增强生成对抗网络（SEGAN）取得了可喜的成果。本文提出了一种基于SEGAN的新的网络架构和丢失功能，用于语音增强。与该领域中应用的大多数网络结构不同，称为高级GAN（HLGAN）的新网络在训练阶段使用并行的有噪声和纯净语音信号作为输入，而不仅仅是有噪声的语音信号，这使我们能够充分利用干净的语音信号携带的信息。此外，我们引入了一种新的监督语音表示损失，也称为高级损失，在生成网络的中间隐藏层中。高电平丢失功能在低信噪比（SNR）环境和低资源环境下在语音增强方面有利于HLGAN。我们在广泛的实验中评估了HLGAN的性能，其中我们的模型产生了重大改进。大量的实验进一步证明了我们的模型在各种语音增强情况下的通用性。改善了SEGAN在低SNR环境中消除噪声同时消除语音成分的问题。另外，HLGAN可以有效地同时增强两种低资源语言的语音信号。讨论了HLGAN性能优越的原因。高电平丢失功能在低信噪比（SNR）环境和低资源环境下在语音增强方面有利于HLGAN。我们在广泛的实验中评估了HLGAN的性能，其中我们的模型产生了重大改进。大量的实验进一步证明了我们的模型在各种语音增强情况下的通用性。改善了SEGAN在低SNR环境中消除噪声同时消除语音成分的问题。另外，HLGAN可以有效地同时增强两种低资源语言的语音信号。讨论了HLGAN性能优越的原因。高电平丢失功能在低信噪比（SNR）环境和低资源环境下在语音增强方面有利于HLGAN。我们在广泛的实验中评估了HLGAN的性能，其中我们的模型产生了重大改进。大量的实验进一步证明了我们的模型在各种语音增强情况下的通用性。改善了SEGAN在低SNR环境中消除噪声同时消除语音成分的问题。另外，HLGAN可以有效地同时增强两种低资源语言的语音信号。讨论了HLGAN性能优越的原因。大量的实验进一步证明了我们的模型在各种语音增强情况下的通用性。改善了SEGAN在低SNR环境中消除噪声同时消除语音成分的问题。另外，HLGAN可以有效地同时增强两种低资源语言的语音信号。讨论了HLGAN性能优越的原因。大量的实验进一步证明了我们的模型在各种语音增强情况下的通用性。改善了SEGAN在低SNR环境中消除噪声同时消除语音成分的问题。另外，HLGAN可以有效地同时增强两种低资源语言的语音信号。讨论了HLGAN性能优越的原因。

更新日期：2020-02-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11