iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
arXiv - CS - Sound Pub Date : 2020-04-02 , DOI: arxiv-2004.00932
Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments. In this work, we propose a deep learning-based speech modification method to compensate for the intelligibility loss, with the constraint that the root mean square (RMS) level and duration of the speech signal are maintained before and after modifications. Specifically, we utilize an iMetricGAN approach to optimize the speech intelligibility metrics with generative adversarial networks (GANs). Experimental results show that the proposed iMetricGAN outperforms conventional state-of-the-art algorithms in terms of objective measures, i.e., speech intelligibility in bits (SIIB) and extended short-time objective intelligibility (ESTOI), under a Cafeteria noise condition. In addition, formal listening tests reveal significant intelligibility gains when both noise and reverberation exist.

中文翻译：

iMetricGAN：使用基于生成对抗网络的度量学习增强噪声语音的清晰度

当暴露在不利的嘈杂环境中时，自然语音的可懂度会严重下降。在这项工作中，我们提出了一种基于深度学习的语音修改方法来补偿可懂度损失，约束条件是在修改前后保持语音信号的均方根 (RMS) 水平和持续时间。具体来说，我们利用 iMetricGAN 方法通过生成对抗网络 (GAN) 优化语音清晰度指标。实验结果表明，在自助餐厅噪声条件下，所提出的 iMetricGAN 在客观度量方面优于传统的最先进算法，即以比特为单位的语音清晰度（SIIB）和扩展的短时客观清晰度（ESTOI）。此外，

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文