当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss
arXiv - CS - Sound Pub Date : 2020-03-27 , DOI: arxiv-2003.12326
Yi Luo, Nima Mesgarani

Many recent source separation systems are designed to separate a fixed number of sources out of a mixture. In the cases where the source activation patterns are unknown, such systems have to either adjust the number of outputs or to identify invalid outputs from the valid ones. Iterative separation methods have gain much attention in the community as they can flexibly decide the number of outputs, however (1) they typically rely on long-term information to determine the stopping time for the iterations, which makes them hard to operate in a causal setting; (2) they lack a "fault tolerance" mechanism when the estimated number of sources is different from the actual number. In this paper, we propose a simple training method, the auxiliary autoencoding permutation invariant training (A2PIT), to alleviate the two issues. A2PIT assumes a fixed number of outputs and uses auxiliary autoencoding loss to force the invalid outputs to be the copies of the input mixture, and detects invalid outputs in a fully unsupervised way during inference phase. Experiment results show that A2PIT is able to improve the separation performance across various numbers of speakers and effectively detect the number of speakers in a mixture.



许多最近的源分离系统旨在从混合物中分离出固定数量的源。在源激活模式未知的情况下,此类系统必须调整输出数量或从有效输出中识别无效输出。迭代分离方法由于可以灵活决定输出的数量而在社区中备受关注,但是(1)它们通常依赖于长期信息来确定迭代的停止时间,这使得它们难以在因果关系中操作环境; (2)当估计的来源数量与实际数量不同时,他们缺乏“容错”机制。在本文中,我们提出了一种简单的训练方法,即辅助自编码置换不变训练(A2PIT),以缓解这两个问题。A2PIT 假设输出数量固定,并使用辅助自动编码损失强制无效输出成为输入混合的副本,并在推理阶段以完全无监督的方式检测无效输出。实验结果表明,A2PIT 能够提高跨不同说话人数量的分离性能,并有效地检测混合中说话人的数量。