当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-channel Multi-frame ADL-MVDR for Target Speech Separation
arXiv - CS - Sound Pub Date : 2020-12-24 , DOI: arxiv-2012.13442
Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu

Many purely neural network based speech separation approaches have been proposed that greatly improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to automatic speech recognition (ASR). Minimum variance distortionless response (MVDR) filters strive to remove nonlinear distortions, however, these approaches either are not optimal for removing residual (linear) noise, or they are unstable when used jointly with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The MCMF ADL-MVDR handles different numbers of microphone channels in one framework, where it addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed system is evaluated using a Mandarin audio-visual corpora and is compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed framework under different scenarios and across several objective evaluation metrics, including ASR performance.

中文翻译:

用于目标语音分离的多通道多帧ADL-MVDR

已经提出了许多基于纯神经网络的语音分离方法,这些方法可以大大提高客观评估分数,但是它们通常会引入对自动语音识别(ASR)有害的非线性失真。最小方差无失真响应(MVDR)滤波器致力于消除非线性失真,但是,这些方法对于消除残留(线性)噪声不是最佳选择,或者与神经网络一起使用时不稳定。在这项研究中,我们提出了一种用于目标语音分离的多通道多帧(MCMF)全深度学习(ADL)-MVDR方法,它扩展了我们初步的多通道ADL-MVDR方法。MCMF ADL-MVDR在一个框架中处理不同数量的麦克风通道,可解决线性和非线性失真。时空互相关在建议的方法中也得到了充分利用。拟议的系统使用普通话视听语料库进行评估,并与几种最新方法进行比较。实验结果证明了我们提出的框架在不同场景下以及在多个客观评估指标(包括ASR性能)上的优越性。
更新日期:2020-12-29
down
wechat
bug