当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation
arXiv - CS - Sound Pub Date : 2019-10-30 , DOI: arxiv-1910.14104
Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka

An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones. The former requires the system to be invariant to different indexing of the microphones with the same locations, while the latter requires the system to be able to process inputs with varying dimensions. Conventional optimization-based beamforming techniques satisfy these requirements by definition, while for deep learning-based end-to-end systems those constraints are not fully addressed. In this paper, we propose transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation. Based on the filter-and-sum network (FaSNet), a recently proposed end-to-end time-domain beamforming system, we show how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays. Moreover, we show that TAC also significantly improves the separation performance with fixed geometry array configuration, further proving the effectiveness of the proposed paradigm in the general problem of multi-microphone speech separation.

中文翻译:

端到端麦克风排列和数不变多通道语音分离

Ad-hoc 麦克风语音分离的一个重要问题是如何保证系统在麦克风位置和数量方面的鲁棒性。前者要求系统对具有相同位置的麦克风的不同索引保持不变,而后者要求系统能够处理具有不同维度的输入。传统的基于优化的波束成形技术根据定义满足这些要求,而对于基于深度学习的端到端系统,这些限制并未完全解决。在本文中,我们提出了变换平均连接 (TAC),这是一种用于通道置换和数量不变的多通道语音分离的简单设计范例。基于最近提出的端到端时域波束成形系统——滤波求和网络(FaSNet),我们展示了 TAC 如何显着提高不同数量麦克风的分离性能,在使用 ad-hoc 阵列的嘈杂混响分离任务中。此外,我们表明 TAC 还显着提高了固定几何阵列配置的分离性能,进一步证明了所提出的范式在多麦克风语音分离的一般问题中的有效性。
更新日期:2020-03-30
down
wechat
bug