Scene Retrieval for Contextual Visual Mapping,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scene Retrieval for Contextual Visual Mapping
arXiv - CS - Robotics Pub Date : 2021-02-25 , DOI: arxiv-2102.12728
William H. B. Smith, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map'. Localization accuracy requirements for specific areas of the visual map, `scene classes', vary according to the context of the environment and task. State-of-the-art visual mapping is unable to reflect these requirements by explicitly targetting scene classes for inclusion in the map. Four different scene classes, including pedestrian crossings and stations, are identified in each of the Nordland and St. Lucia datasets. Instead of re-training separate scene classifiers which struggle with these overlapping scene classes we make our first contribution: defining the problem of `scene retrieval'. Scene retrieval extends image retrieval to classification of scenes defined at test time by associating a single query image to reference images of scene classes. Our second contribution is a triplet-trained convolutional neural network (CNN) to address this problem which increases scene classification accuracy by up to 7% against state-of-the-art networks pre-trained for scene recognition. The second contribution is an algorithm `DMC' that combines our scene classification with distance and memorability for visual mapping. Our analysis shows that DMC includes 64% more images of our chosen scene classes in a visual map than just using distance interval mapping. State-of-the-art visual place descriptors AMOS-Net, Hybrid-Net and NetVLAD are finally used to show that DMC improves scene class localization accuracy by a mean of 3% and localization accuracy of the remaining map images by a mean of 10% across both datasets.

中文翻译：

用于上下文视觉映射的场景检索

视觉导航根据位置图像的参考数据库（也称为“视觉地图”）将查询的位置图像本地化。视景图特定区域（场景类别）的定位精度要求会根据环境和任务的具体情况而有所不同。最新的视觉映射无法通过明确地定位要包含在地图中的场景类来反映这些要求。在每个Nordland和St. Lucia数据集中，确定了四个不同的场景类别，包括人行横道和车站。我们不必再训练与这些重叠的场景类打交道的单独的场景分类器，而是做出我们的第一个贡献：定义“场景检索”的问题。场景检索通过将单个查询图像与场景类的参考图像相关联，将图像检索扩展到在测试时定义的场景分类。我们的第二个贡献是三元组训练的卷积神经网络（CNN），以解决此问题，与针对场景识别进行预训练的最新网络相比，该算法将场景分类的准确率提高了7％。第二个贡献是算法“ DMC”，该算法将场景分类与距离和可记忆性相结合，以进行视觉映射。我们的分析表明，与仅使用距离间隔映射相比，DMC在视觉映射中包含我们选择的场景类别的图像多64％的图像。最先进的视觉场所描述符AMOS-Net，

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文