Dynamically localizing multiple speakers based on the time-frequency domain,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamically localizing multiple speakers based on the time-frequency domain
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2021-04-08 , DOI: 10.1186/s13636-021-00203-w
Hodaya Hammer , Shlomo E. Chazan , Jacob Goldberger , Sharon Gannot

In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, time-frequency (TF) bin is dominated by a single speaker and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high-resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using simulated and real-life recordings in static and dynamic scenarios demonstrates that the proposed algorithm significantly outperforms both classic and recent deep-learning-based algorithms. Finally, as a byproduct, we further show that the proposed method is also capable of separating moving speakers by the application of the obtained TF masks.

中文翻译：

基于时频域动态定位多个扬声器

在这项研究中，我们提出了一种基于深度神经网络的基于多麦克风阵列的在线多扬声器定位算法。遵循频谱域中的W不相交正交性原理，时频（TF）单元由单个扬声器控制，因此由单个到达方向（DOA）控制。用瞬时空间特征训练全卷积网络以估计每个TF单元的DOA。高分辨率分类使网络能够准确并同时定位和跟踪静态和动态的多个扬声器。在静态和动态场景中使用模拟和真实记录进行的详尽实验研究表明，该算法大大优于经典算法和基于深度学习的算法。最后，作为副产品，

更新日期：2021-04-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文