Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems
arXiv - CS - Sound Pub Date : 2020-03-27 , DOI: arxiv-2003.12425
Akhil Mathur, Anton Isopoussu, Fahim Kawsar, Nadia Berthouze, Nicholas D. Lane

Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic -- a machine-learned system component -- which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put a minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks.

中文翻译：

Mic2Mic：使用循环一致的生成对抗网络来克服语音系统中的麦克风可变性

移动和嵌入式设备越来越多地使用麦克风和基于音频的计算模型来推断用户上下文。构建将音频模型与商用麦克风相结合的系统的一个主要挑战是保证它们在现实世界中的准确性和鲁棒性。除了许多环境动态之外，影响音频模型稳健性的一个主要因素是麦克风可变性。在这项工作中，我们提出了 Mic2Mic——一个机器学习系统组件——它驻留在音频模型的推理管道中，并实时减少由麦克风特定因素引起的音频数据的可变性。Mic2Mic 设计的两个关键考虑因素是：a) 将麦克风可变性问题与音频任务分离，以及 b) 将提供训练数据的最终用户负担降至最低。考虑到这些，我们应用循环一致生成对抗网络 (CycleGAN) 的原理，使用从不同麦克风收集的未标记和未配对数据来学习 Mic2Mic。我们的实验表明，对于两个常见的音频任务，由于麦克风可变性，Mic2Mic 可以恢复 66% 到 89% 的精度损失。

更新日期：2020-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文