当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2022-06-18 , DOI: 10.1186/s13636-022-00246-7
Alexander Bohlender , Lucas Van Severen , Jonathan Sterckx , Nilesh Madhu

By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular central Gaussian mixture model (cACGMM), can be determined based on the given signal mixture itself. Then, no misfit between training and testing conditions arises, as opposed to approaches that require labeled datasets to be trained. Whereas the separation can be performed in a completely unsupervised way, it may be beneficial to take advantage of a priori knowledge. The parameter estimation is sensitive to the initialization, and it is necessary to address the frequency permutation problem. In this paper, we therefore consider three techniques to overcome these limitations using direction of arrival (DOA) estimates. First, we propose an initialization with simple DOA-based masks. Secondly, we derive speaker specific time annotations from the same masks in order to constrain the cACGMM. Thirdly, we employ an approach where the mixture components are specific to each DOA instead of each speaker. We conduct experiments with sudden DOA changes, as well as a gradually moving speaker. The results demonstrate that particularly the DOA-based initialization is effective to overcome both of the described limitations. In this case, even methods based on normally unavailable oracle information are not observed to be more beneficial to the permutation resolution or the initialization. Lastly, we also show that the proposed DOA-guided source separation works quite robustly in the presence of adverse conditions and realistic DOA estimation errors.

中文翻译:

使用复杂角中心高斯混合模型的基于方向的初始化和时间注释的 DOA 引导源分离

通过空间聚类和时频掩蔽,可以将多个说话者和噪声的混合体分离成底层信号分量。模型的参数,例如复杂的角中心高斯混合模型 (cACGMM),可以基于给定的信号混合本身来确定。然后,与需要训练标记数据集的方法相反,不会出现训练和测试条件之间的不匹配。尽管可以以完全无监督的方式进行分离,但利用先验知识可能是有益的。参数估计对初始化很敏感,需要解决频率置换问题。因此,在本文中,我们考虑使用到达方向 (DOA) 估计来克服这些限制的三种技术。第一的,我们建议使用简单的基于 DOA 的掩码进行初始化。其次,我们从相同的掩码中导出特定于说话者的时间注释,以约束 cACGMM。第三,我们采用一种方法,其中混合成分特定于每个 DOA 而不是每个说话者。我们对突然的 DOA 变化以及逐渐移动的扬声器进行实验。结果表明,特别是基于 DOA 的初始化可以有效地克服上述两个限制。在这种情况下,即使是基于通常不可用的预言机信息的方法也没有被观察到对置换解析或初始化更有利。最后,我们还表明,所提出的 DOA 引导的源分离在不利条件和实际 DOA 估计错误的情况下工作得非常稳健。
更新日期:2022-06-19
down
wechat
bug