当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SEANet: A Multi-modal Speech Enhancement Network
arXiv - CS - Sound Pub Date : 2020-09-04 , DOI: arxiv-2009.02095 Marco Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek
arXiv - CS - Sound Pub Date : 2020-09-04 , DOI: arxiv-2009.02095 Marco Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek
We explore the possibility of leveraging accelerometer data to perform speech
enhancement in very noisy conditions. Although it is possible to only partially
reconstruct user's speech from the accelerometer, the latter provides a strong
conditioning signal that is not influenced from noise sources in the
environment. Based on this observation, we feed a multi-modal input to SEANet
(Sound EnhAncement Network), a wave-to-wave fully convolutional model, which
adopts a combination of feature losses and adversarial losses to reconstruct an
enhanced version of user's speech. We trained our model with data collected by
sensors mounted on an earbud and synthetically corrupted by adding different
kinds of noise sources to the audio signal. Our experimental results
demonstrate that it is possible to achieve very high quality results, even in
the case of interfering speech at the same level of loudness. A sample of the
output produced by our model is available at
https://google-research.github.io/seanet/multimodal/speech.
中文翻译:
SEANet:多模态语音增强网络
我们探索了利用加速度计数据在非常嘈杂的条件下执行语音增强的可能性。虽然只能部分地从加速度计重建用户的语音,但后者提供了不受环境中噪声源影响的强调节信号。基于这一观察,我们将多模态输入提供给 SEANet(声音增强网络),这是一种波到波全卷积模型,它采用特征损失和对抗性损失的组合来重建用户语音的增强版本。我们使用安装在耳塞上的传感器收集的数据训练我们的模型,并通过向音频信号添加不同类型的噪声源来综合破坏。我们的实验结果表明,可以获得非常高质量的结果,即使在相同响度级别的干扰语音的情况下。我们的模型产生的输出示例可在 https://google-research.github.io/seanet/multimodal/speech 获得。
更新日期:2020-10-02
中文翻译:
SEANet:多模态语音增强网络
我们探索了利用加速度计数据在非常嘈杂的条件下执行语音增强的可能性。虽然只能部分地从加速度计重建用户的语音,但后者提供了不受环境中噪声源影响的强调节信号。基于这一观察,我们将多模态输入提供给 SEANet(声音增强网络),这是一种波到波全卷积模型,它采用特征损失和对抗性损失的组合来重建用户语音的增强版本。我们使用安装在耳塞上的传感器收集的数据训练我们的模型,并通过向音频信号添加不同类型的噪声源来综合破坏。我们的实验结果表明,可以获得非常高质量的结果,即使在相同响度级别的干扰语音的情况下。我们的模型产生的输出示例可在 https://google-research.github.io/seanet/multimodal/speech 获得。