SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-12-11 , DOI: 10.1109/lsp.2020.3043977
Ke Tan , Buye Xu , Anurag Kumar , Eliya Nachmani , Yossi Adi

Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated binaural signals. Specifically, we extend a newly-developed gated recurrent neural network for monaural separation by additionally incorporating self-attention mechanisms and dense connectivity. We develop an end-to-end multiple-input multiple-output system, which directly maps from the binaural waveform of the mixture to those of the speech signals. The experimental results show that our proposed approach achieves significantly better separation performance than a recent binaural separation approach. In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.

中文翻译：

SAGRNN：用于双耳扬声器分离和耳间提示保留的自注意力门控 RNN

大多数现有的基于深度学习的双耳说话者分离系统专注于为每个目标说话者生成单耳估计，因此不会保留耳间线索，而这对于人类听众执行声音定位和偏侧化至关重要。在这项研究中，我们利用估计的双耳信号中保留的耳间线索来解决与说话者无关的双耳说话者分离问题。具体来说，我们通过额外结合自注意力机制和密集连接来扩展新开发的用于单耳分离的门控循环神经网络。我们开发了一种端到端的多输入多输出系统，该系统直接将混合物的双耳波形映射到语音信号的波形。实验结果表明，我们提出的方法比最近的双耳分离方法取得了明显更好的分离性能。此外，我们的方法有效地保留了耳间线索，从而提高了声音定位的准确性。

更新日期：2020-12-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11