当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SEANet: A Multi-modal Speech Enhancement Network
arXiv - CS - Sound Pub Date : 2020-09-04 , DOI: arxiv-2009.02095
Marco Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek

We explore the possibility of leveraging accelerometer data to perform speech enhancement in very noisy conditions. Although it is possible to only partially reconstruct user's speech from the accelerometer, the latter provides a strong conditioning signal that is not influenced from noise sources in the environment. Based on this observation, we feed a multi-modal input to SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which adopts a combination of feature losses and adversarial losses to reconstruct an enhanced version of user's speech. We trained our model with data collected by sensors mounted on an earbud and synthetically corrupted by adding different kinds of noise sources to the audio signal. Our experimental results demonstrate that it is possible to achieve very high quality results, even in the case of interfering speech at the same level of loudness. A sample of the output produced by our model is available at https://google-research.github.io/seanet/multimodal/speech.

中文翻译:

SEANet:多模态语音增强网络

我们探索了利用加速度计数据在非常嘈杂的条件下执行语音增强的可能性。虽然只能部分地从加速度计重建用户的语音,但后者提供了不受环境中噪声源影响的强调节信号。基于这一观察,我们将多模态输入提供给 SEANet(声音增强网络),这是一种波到波全卷积模型,它采用特征损失和对抗性损失的组合来重建用户语音的增强版本。我们使用安装在耳塞上的传感器收集的数据训练我们的模型,并通过向音频信号添加不同类型的噪声源来综合破坏。我们的实验结果表明,可以获得非常高质量的结果,即使在相同响度级别的干扰语音的情况下。我们的模型产生的输出示例可在 https://google-research.github.io/seanet/multimodal/speech 获得。
更新日期:2020-10-02
down
wechat
bug