当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Multi-channel Multi-frame ADL-MVDR for Target Speech Separation
arXiv - CS - Sound Pub Date : 2020-12-24 , DOI: arxiv-2012.13442 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu
arXiv - CS - Sound Pub Date : 2020-12-24 , DOI: arxiv-2012.13442 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu
Many purely neural network based speech separation approaches have been
proposed that greatly improve objective assessment scores, but they often
introduce nonlinear distortions that are harmful to automatic speech
recognition (ASR). Minimum variance distortionless response (MVDR) filters
strive to remove nonlinear distortions, however, these approaches either are
not optimal for removing residual (linear) noise, or they are unstable when
used jointly with neural networks. In this study, we propose a multi-channel
multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech
separation, which extends our preliminary multi-channel ADL-MVDR approach. The
MCMF ADL-MVDR handles different numbers of microphone channels in one
framework, where it addresses linear and nonlinear distortions. Spatio-temporal
cross correlations are also fully utilized in the proposed approach. The
proposed system is evaluated using a Mandarin audio-visual corpora and is
compared with several state-of-the-art approaches. Experimental results
demonstrate the superiority of our proposed framework under different scenarios
and across several objective evaluation metrics, including ASR performance.
中文翻译:
用于目标语音分离的多通道多帧ADL-MVDR
已经提出了许多基于纯神经网络的语音分离方法,这些方法可以大大提高客观评估分数,但是它们通常会引入对自动语音识别(ASR)有害的非线性失真。最小方差无失真响应(MVDR)滤波器致力于消除非线性失真,但是,这些方法对于消除残留(线性)噪声不是最佳选择,或者与神经网络一起使用时不稳定。在这项研究中,我们提出了一种用于目标语音分离的多通道多帧(MCMF)全深度学习(ADL)-MVDR方法,它扩展了我们初步的多通道ADL-MVDR方法。MCMF ADL-MVDR在一个框架中处理不同数量的麦克风通道,可解决线性和非线性失真。时空互相关在建议的方法中也得到了充分利用。拟议的系统使用普通话视听语料库进行评估,并与几种最新方法进行比较。实验结果证明了我们提出的框架在不同场景下以及在多个客观评估指标(包括ASR性能)上的优越性。
更新日期:2020-12-29
中文翻译:
用于目标语音分离的多通道多帧ADL-MVDR
已经提出了许多基于纯神经网络的语音分离方法,这些方法可以大大提高客观评估分数,但是它们通常会引入对自动语音识别(ASR)有害的非线性失真。最小方差无失真响应(MVDR)滤波器致力于消除非线性失真,但是,这些方法对于消除残留(线性)噪声不是最佳选择,或者与神经网络一起使用时不稳定。在这项研究中,我们提出了一种用于目标语音分离的多通道多帧(MCMF)全深度学习(ADL)-MVDR方法,它扩展了我们初步的多通道ADL-MVDR方法。MCMF ADL-MVDR在一个框架中处理不同数量的麦克风通道,可解决线性和非线性失真。时空互相关在建议的方法中也得到了充分利用。拟议的系统使用普通话视听语料库进行评估,并与几种最新方法进行比较。实验结果证明了我们提出的框架在不同场景下以及在多个客观评估指标(包括ASR性能)上的优越性。