当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
arXiv - CS - Sound Pub Date : 2020-04-02 , DOI: arxiv-2004.00932 Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi
arXiv - CS - Sound Pub Date : 2020-04-02 , DOI: arxiv-2004.00932 Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi
The intelligibility of natural speech is seriously degraded when exposed to
adverse noisy environments. In this work, we propose a deep learning-based
speech modification method to compensate for the intelligibility loss, with the
constraint that the root mean square (RMS) level and duration of the speech
signal are maintained before and after modifications. Specifically, we utilize
an iMetricGAN approach to optimize the speech intelligibility metrics with
generative adversarial networks (GANs). Experimental results show that the
proposed iMetricGAN outperforms conventional state-of-the-art algorithms in
terms of objective measures, i.e., speech intelligibility in bits (SIIB) and
extended short-time objective intelligibility (ESTOI), under a Cafeteria noise
condition. In addition, formal listening tests reveal significant
intelligibility gains when both noise and reverberation exist.
中文翻译:
iMetricGAN:使用基于生成对抗网络的度量学习增强噪声语音的清晰度
当暴露在不利的嘈杂环境中时,自然语音的可懂度会严重下降。在这项工作中,我们提出了一种基于深度学习的语音修改方法来补偿可懂度损失,约束条件是在修改前后保持语音信号的均方根 (RMS) 水平和持续时间。具体来说,我们利用 iMetricGAN 方法通过生成对抗网络 (GAN) 优化语音清晰度指标。实验结果表明,在自助餐厅噪声条件下,所提出的 iMetricGAN 在客观度量方面优于传统的最先进算法,即以比特为单位的语音清晰度(SIIB)和扩展的短时客观清晰度(ESTOI)。此外,
更新日期:2020-04-08
中文翻译:
iMetricGAN:使用基于生成对抗网络的度量学习增强噪声语音的清晰度
当暴露在不利的嘈杂环境中时,自然语音的可懂度会严重下降。在这项工作中,我们提出了一种基于深度学习的语音修改方法来补偿可懂度损失,约束条件是在修改前后保持语音信号的均方根 (RMS) 水平和持续时间。具体来说,我们利用 iMetricGAN 方法通过生成对抗网络 (GAN) 优化语音清晰度指标。实验结果表明,在自助餐厅噪声条件下,所提出的 iMetricGAN 在客观度量方面优于传统的最先进算法,即以比特为单位的语音清晰度(SIIB)和扩展的短时客观清晰度(ESTOI)。此外,