Forward-Looking Sonar Patch Matching: Modern CNNs, Ensembling, and Uncertainty,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Forward-Looking Sonar Patch Matching: Modern CNNs, Ensembling, and Uncertainty
arXiv - CS - Robotics Pub Date : 2021-08-02 , DOI: arxiv-2108.01066
Arka Mallick, Paul Plöger, Matias Valdenegro-Toro

Application of underwater robots are on the rise, most of them are dependent on sonar for underwater vision, but the lack of strong perception capabilities limits them in this task. An important issue in sonar perception is matching image patches, which can enable other techniques like localization, change detection, and mapping. There is a rich literature for this problem in color images, but for acoustic images, it is lacking, due to the physics that produce these images. In this paper we improve on our previous results for this problem (Valdenegro-Toro et al, 2017), instead of modeling features manually, a Convolutional Neural Network (CNN) learns a similarity function and predicts if two input sonar images are similar or not. With the objective of improving the sonar image matching problem further, three state of the art CNN architectures are evaluated on the Marine Debris dataset, namely DenseNet, and VGG, with a siamese or two-channel architecture, and contrastive loss. To ensure a fair evaluation of each network, thorough hyper-parameter optimization is executed. We find that the best performing models are DenseNet Two-Channel network with 0.955 AUC, VGG-Siamese with contrastive loss at 0.949 AUC and DenseNet Siamese with 0.921 AUC. By ensembling the top performing DenseNet two-channel and DenseNet-Siamese models overall highest prediction accuracy obtained is 0.978 AUC, showing a large improvement over the 0.91 AUC in the state of the art.

中文翻译：

前瞻性声纳贴片匹配：现代 CNN、集成和不确定性

水下机器人的应用正在兴起，它们大多依赖声纳进行水下视觉，但缺乏强大的感知能力限制了它们在这项任务中的发挥。声纳感知中的一个重要问题是匹配图像块，这可以启用其他技术，如定位、变化检测和映射。关于彩色图像中的这个问题有丰富的文献，但对于声学图像，由于产生这些图像的物理原理，它是缺乏的。在本文中，我们改进了之前针对该问题的结果（Valdenegro-Toro 等人，2017），而不是手动建模特征，卷积神经网络 (CNN) 学习相似性函数并预测两个输入声纳图像是否相似. 为了进一步改善声纳图像匹配问题，在 Marine Debris 数据集上评估了三种最先进的 CNN 架构，即 DenseNet 和 VGG，具有孪生或双通道架构，以及对比损失。为了确保对每个网络的公平评估，执行彻底的超参数优化。我们发现性能最好的模型是 AUC 为 0.955 的 DenseNet 双通道网络、AUC 为 0.949 的对比损失的 VGG-Siamese 和 AUC 为 0.921 的 DenseNet Siamese。通过集成性能最佳的 DenseNet 双通道和 DenseNet-Siamese 模型，获得的总体最高预测精度为 0.978 AUC，与现有技术的 0.91 AUC 相比有了很大的改进。执行彻底的超参数优化。我们发现性能最好的模型是 AUC 为 0.955 的 DenseNet 双通道网络、AUC 为 0.949 的对比损失的 VGG-Siamese 和 AUC 为 0.921 的 DenseNet Siamese。通过集成性能最佳的 DenseNet 双通道和 DenseNet-Siamese 模型，获得的总体最高预测精度为 0.978 AUC，与现有技术的 0.91 AUC 相比有了很大的改进。执行彻底的超参数优化。我们发现性能最好的模型是 AUC 为 0.955 的 DenseNet 双通道网络、AUC 为 0.949 的对比损失的 VGG-Siamese 和 AUC 为 0.921 的 DenseNet Siamese。通过集成性能最佳的 DenseNet 双通道和 DenseNet-Siamese 模型，获得的总体最高预测精度为 0.978 AUC，与现有技术的 0.91 AUC 相比有了很大的改进。

更新日期：2021-08-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文