当前位置: X-MOL 学术Multidimens. Syst. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Concurrent speakers localization using blind source separation and microphone array geometry
Multidimensional Systems and Signal Processing ( IF 1.7 ) Pub Date : 2021-05-09 , DOI: 10.1007/s11045-021-00776-x
Muhammad Umair Khan , Tania Habib

Speaker localization has been an active topic of research due to its wide range of applications in multimedia and communication technologies. While traditional blind source separation algorithms are robust in reverberant environments, they are generally unable to localize more than two concurrent speakers. In this paper, a novel method for localization of concurrent speakers using blind source separation by exploiting microphone array geometry is presented. In this work, we used the TRINICON BSS (Buchner et al., in: 2004 IEEE international conference on acoustics, speech, and signal processing, IEEE, 2004) algorithm as the baseline for determining the raw direction of arrival estimates, the results have shown that the proposed algorithm is capable of localizing up to three concurrent speakers successfully by exploiting the redundancy in the microphone array. The algorithm is evaluated in real-world environments with background noise and reverberations such as computer labs and meeting rooms. The localization results were compared with the well-known Steered-Response Power Phase Transform (SRP-PHAT) algorithm using the root mean square error as an evaluation metric. The results for the two speakers and three concurrent speaker scenarios show that the proposed algorithm is more stable and robust as compared to the SRP-PHAT. Moreover, the proposed algorithm also shows the potential to track multiple simultaneous moving speakers, hence it can be used as a front-end by a speaker tracking algorithm.



中文翻译:

使用盲源分离和麦克风阵列几何形状进行并发扬声器本地化

演讲者本地化由于在多媒体和通信技术中的广泛应用而一直是研究的一个活跃主题。尽管传统的盲源分离算法在混响环境中很健壮,但它们通常无法定位两个以上的并发扬声器。在本文中,提出了一种通过利用麦克风阵列的几何形状使用盲源分离对并发扬声器进行定位的新方法。在这项工作中,我们使用TRINICON BSS(Buchner等人,于2004年召开的2004年IEEE声学,语音和信号处理国际会议,IEEE,2004年)算法作为基准来确定到达估计的原始方向,结果表明,通过利用麦克风阵列中的冗余,该算法能够成功定位多达三个并发扬声器。该算法在现实环境中带有背景噪声和混响的环境中进行了评估,例如计算机实验室和会议室。使用均方根误差作为评估指标,将定位结果与众所周知的转向响应功率相位变换(SRP-PHAT)算法进行了比较。两个说话人和三个同时说话人场景的结果表明,与SRP-PHAT相比,所提出的算法更加稳定和健壮。此外,所提出的算法还显示了跟踪多个同时移动的扬声器的潜力,因此可以被扬声器跟踪算法用作前端。该算法在现实环境中带有背景噪声和混响的环境中进行了评估,例如计算机实验室和会议室。使用均方根误差作为评估指标,将定位结果与众所周知的转向响应功率相位变换(SRP-PHAT)算法进行了比较。两个说话人和三个同时说话人场景的结果表明,与SRP-PHAT相比,所提出的算法更加稳定和健壮。此外,所提出的算法还显示了跟踪多个同时移动的扬声器的潜力,因此可以被扬声器跟踪算法用作前端。该算法在现实环境中带有背景噪声和混响的环境中进行了评估,例如计算机实验室和会议室。使用均方根误差作为评估指标,将定位结果与众所周知的转向响应功率相位变换(SRP-PHAT)算法进行了比较。两个说话人和三个同时说话人场景的结果表明,与SRP-PHAT相比,所提出的算法更加稳定和健壮。此外,所提出的算法还显示了跟踪多个同时移动的扬声器的潜力,因此可以被扬声器跟踪算法用作前端。使用均方根误差作为评估指标,将定位结果与众所周知的转向响应功率相位变换(SRP-PHAT)算法进行了比较。两个说话人和三个同时说话人场景的结果表明,与SRP-PHAT相比,所提出的算法更加稳定和健壮。此外,所提出的算法还显示了跟踪多个同时移动的扬声器的潜力,因此可以被扬声器跟踪算法用作前端。使用均方根误差作为评估指标,将定位结果与众所周知的转向响应功率相位变换(SRP-PHAT)算法进行了比较。两个说话人和三个同时说话人场景的结果表明,与SRP-PHAT相比,所提出的算法更加稳定和健壮。此外,所提出的算法还显示了跟踪多个同时移动的扬声器的潜力,因此可以被扬声器跟踪算法用作前端。

更新日期:2021-05-09
down
wechat
bug