Improved feature extraction for CRNN-based multiple sound source localization,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improved feature extraction for CRNN-based multiple sound source localization
arXiv - CS - Sound Pub Date : 2021-05-05 , DOI: arxiv-2105.01897
Pierre-Amaury Grumiaux, Srdan Kitic, Laurent Girin, Alexandre Guérin

In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.

中文翻译：

基于CRNN的多声源定位的改进特征提取

在这项工作中，我们建议基于卷积递归神经网络和Ambisonics信号扩展最先进的多源定位系统。通过更改卷积和池化层之间的布局，我们显着提高了基准网络的性能。我们提出了几种配置，它们之间具有更多的卷积层和较小的池大小，以便在各层之间丢失较少的信息，从而实现更好的特征提取。同时，我们测试了系统最多可定位3个源的能力，在这种情况下，改进的特征提取可最大程度地提高准确性。我们评估和比较这些在合成数据和实际数据上的改进配置。

更新日期：2021-05-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文