Acoustic source localization with deep generalized cross correlations,Signal Processing

当前位置： X-MOL 学术 › Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Acoustic source localization with deep generalized cross correlations
Signal Processing ( IF 3.4 ) Pub Date : 2021-05-27 , DOI: 10.1016/j.sigpro.2021.108169
Juan Manuel Vera-Diaz , Daniel Pizarro , Javier Macias-Guarasa

One of the most popular techniques for Acoustic Source Localization is the Generalized Cross Correlation (GCC) and its use in Steered Response Power techniques (SRP). Nowadays, Deep Learning strategies may outperform these classical methods, but they are generally dependent on the room and sensor geometric configuration that are used during the training phases. Hence, they require adaptation and re-training when facing a new environment, which is a problem in practice as re-training requires labelling new data and running a complex training algorithm. In this work we use a Convolutional Deep Neural Network that transforms the GCC between two signals into a Gaussian shaped signal, that we call Deep Generalized Cross Correlation (DeepGCC). We combine DeepGCC estimations to create a 3D acoustic map, similarly to SRP techniques. This acoustic map can be further refined using a sparse generative model to recover the source position. Crucially, we can adapt the acoustic map to different microphone array geometries without retraining the DeepGCC network. We show that our method outperforms both classical approaches and recent Deep Learning strategies in real and simulated challenging scenarios with mismatched training-testing conditions, not requiring re-training with different sensor configurations or room environments.

中文翻译：

具有深度广义互相关的声源定位

声源定位最流行的技术之一是广义互相关 (GCC) 及其在导向响应功率技术 (SRP) 中的使用。如今，深度学习策略可能优于这些经典方法，但它们通常取决于训练阶段使用的房间和传感器几何配置。因此，它们在面对新环境时需要适应和重新训练，这在实践中是一个问题，因为重新训练需要标记新数据并运行复杂的训练算法。在这项工作中，我们使用卷积深度神经网络将两个信号之间的 GCC 转换为高斯形状的信号，我们称之为深度广义互相关 (DeepGCC)。我们结合 DeepGCC 估计来创建 3D 声学图，类似于 SRP 技术。可以使用稀疏生成模型进一步细化该声学图以恢复源位置。至关重要的是，我们可以在不重新训练 DeepGCC 网络的情况下，使声学图适应不同的麦克风阵列几何形状。我们表明，我们的方法在具有不匹配训练-测试条件的真实和模拟具有挑战性的场景中优于经典方法和最近的深度学习策略，不需要使用不同的传感器配置或房间环境进行重新训练。

更新日期：2021-06-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11