Neural Network-based Virtual Microphone Estimator,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Neural Network-based Virtual Microphone Estimator
arXiv - CS - Sound Pub Date : 2021-01-12 , DOI: arxiv-2101.04315
Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki

Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices. One direction to address this situation consists of virtually augmenting the number of microphone signals, e.g., based on several physical model assumptions. However, such assumptions are not necessarily met in realistic conditions. In this paper, as an alternative approach, we propose a neural network-based virtual microphone estimator (NN-VME). The NN-VME estimates virtual microphone signals directly in the time domain, by utilizing the precise estimation capability of the recent time-domain neural networks. We adopt a fully supervised learning framework that uses actual observations at the locations of the virtual microphones at training time. Consequently, the NN-VME can be trained using only multi-channel observations and thus directly on real recordings, avoiding the need for unrealistic physical model-based assumptions. Experiments on the CHiME-4 corpus show that the proposed NN-VME achieves high virtual microphone estimation performance even for real recordings and that a beamformer augmented with the NN-VME improves both the speech enhancement and recognition performance.

中文翻译：

基于神经网络的虚拟麦克风估计器

由于许多设备的限制，为少数麦克风开发麦克风阵列技术非常重要。解决这种情况的一个方向包括，例如，基于几个物理模型假设，虚拟地增加麦克风信号的数量。但是，这些假设不一定在现实条件下得到满足。在本文中，作为一种替代方法，我们提出了一种基于神经网络的虚拟麦克风估计器（NN-VME）。NN-VME通过利用最近时域神经网络的精确估计功能，直接在时域中估计虚拟麦克风信号。我们采用了完全监督的学习框架，该框架在训练时使用虚拟麦克风位置的实际观察结果。所以，NN-VME可以仅使用多通道观测值进行训练，因此可以直接在真实记录上进行训练，从而无需基于物理模型的不现实假设。在CHiME-4语料库上进行的实验表明，所提出的NN-VME即使在实际录音中也能实现较高的虚拟麦克风估计性能，并且使用NN-VME增强的波束形成器可同时改善语音增强和识别性能。

更新日期：2021-01-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文