Generalized RNN beamformer for target speech separation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generalized RNN beamformer for target speech separation
arXiv - CS - Sound Pub Date : 2021-01-04 , DOI: arxiv-2101.01280
Yong Xu, Zhuohuang Zhang, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Dong Yu

Recently we proposed an all-deep-learning minimum variance distortionless response (ADL-MVDR) method where the unstable matrix inverse and principal component analysis (PCA) operations in the MVDR were replaced by recurrent neural networks (RNNs). However, it is not clear whether the success of the ADL-MVDR is owed to the calculated covariance matrices or following the MVDR formula. In this work, we demonstrate the importance of the calculated covariance matrices and propose three types of generalized RNN beamformers (GRNN-BFs) where the beamforming solution is beyond the MVDR and optimal. The GRNN-BFs could predict the frame-wise beamforming weights by leveraging on the temporal modeling capability of RNNs. The proposed GRNN-BF method obtains better performance than the state-of-the-art ADL-MVDR and the traditional mask-based MVDR methods in terms of speech quality (PESQ), speech-to-noise ratio (SNR), and word error rate (WER).

中文翻译：

用于目标语音分离的通用RNN波束形成器

最近，我们提出了一种全深度学习最小方差无失真响应（ADL-MVDR）方法，该方法将MVDR中的不稳定矩阵逆和主成分分析（PCA）操作替换为递归神经网络（RNN）。但是，尚不清楚ADL-MVDR的成功是归因于计算的协方差矩阵还是遵循MVDR公式。在这项工作中，我们证明了计算出的协方差矩阵的重要性，并提出了三种类型的广义RNN波束成形器（GRNN-BFs），其中波束成形解决方案超出了MVDR且是最优的。GRNN-BF可以利用RNN的时间建模能力来预测逐帧波束成形权重。

更新日期：2021-01-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文