当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SeqNet: Learning Descriptors for Sequence-based Hierarchical Place Recognition
arXiv - CS - Robotics Pub Date : 2021-02-23 , DOI: arxiv-2102.11603
Sourav Garg, Michael Milford

Visual Place Recognition (VPR) is the task of matching current visual imagery from a camera to images stored in a reference map of the environment. While initial VPR systems used simple direct image methods or hand-crafted visual features, recent work has focused on learning more powerful visual features and further improving performance through either some form of sequential matcher / filter or a hierarchical matching process. In both cases the performance of the initial single-image based system is still far from perfect, putting significant pressure on the sequence matching or (in the case of hierarchical systems) pose refinement stages. In this paper we present a novel hybrid system that creates a high performance initial match hypothesis generator using short learnt sequential descriptors, which enable selective control sequential score aggregation using single image learnt descriptors. Sequential descriptors are generated using a temporal convolutional network dubbed SeqNet, encoding short image sequences using 1-D convolutions, which are then matched against the corresponding temporal descriptors from the reference dataset to provide an ordered list of place match hypotheses. We then perform selective sequential score aggregation using shortlisted single image learnt descriptors from a separate pipeline to produce an overall place match hypothesis. Comprehensive experiments on challenging benchmark datasets demonstrate the proposed method outperforming recent state-of-the-art methods using the same amount of sequential information. Source code and supplementary material can be found at https://github.com/oravus/seqNet.

中文翻译:

SeqNet:用于基于序列的层次位置识别的学习描述符

视觉位置识别(VPR)是将摄像机的当前视觉图像与环境参考图中存储的图像进行匹配的任务。最初的VPR系统使用简单的直接图像方法或手工制作的视觉特征,而最近的工作集中在学习更强大的视觉特征,并通过某种形式的顺序匹配器/过滤器或分层匹配过程进一步提高性能。在这两种情况下,初始基于单图像的系统的性能都还远远不够完美,这给序列匹配或(在分层系统的情况下)姿势优化阶段带来了巨大压力。在本文中,我们介绍了一种新颖的混合系统,该系统使用短学习的顺序描述符创建了高性能的初始匹配假设生成器,使用单个图像学习的描述符可以选择性地控制顺序分数聚合。使用称为SeqNet的时间卷积网络生成顺序描述符,使用一维卷积对短图像序列进行编码,然后将其与参考数据集中的相应时间描述符进行匹配,以提供位置匹配假设的有序列表。然后,我们使用来自单独管道的入围单图像学习描述符执行选择性顺序分数聚合,以产生总体位置匹配假设。在具有挑战性的基准数据集上进行的全面实验表明,在使用相同数量的顺序信息的情况下,所提出的方法优于最新的方法。源代码和补充材料可以在https://github.com/oravus/seqNet上找到。
更新日期:2021-02-24
down
wechat
bug