Speaker Attractor Network: Generalizing Speech Separation to Unseen Numbers of Sources,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Speaker Attractor Network: Generalizing Speech Separation to Unseen Numbers of Sources
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3029704
Fei Jiang , Zhiyao Duan

Most existing speech separation research focuses on improving the separation performance under consistent source number conditions between training and testing. In real-world applications, however, the source number may be different from that in training sets. In this letter, we address this problem by thoroughly improving the deep attractor network in terms of the network architecture and learning objectives so that it can well generalize to separating an unseen number of sources. Experimental results show that, compared with existing models, the proposed method significantly improves the separation performance when generalizing to an unseen number of speakers, and can separate up to five speakers even the model is only trained on two-speaker mixtures.

中文翻译：

说话人吸引器网络：将语音分离推广到看不见的来源数量

大多数现有的语音分离研究都集中在训练和测试之间在一致的源数条件下提高分离性能。然而，在实际应用中，源编号可能与训练集中的不同。在这封信中，我们通过在网络架构和学习目标方面彻底改进深度吸引子网络来解决这个问题，以便它可以很好地推广到分离未知数量的来源。实验结果表明，与现有模型相比，所提出的方法在泛化到未知数量的说话者时显着提高了分离性能，即使模型仅在两个说话者混合上进行训练，也可以分离多达五个说话者。

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>