当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging Local and Global Descriptors in Parallel to Search Correspondences for Visual Localization
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10891
Pengju Zhang, Yihong Wu, Bingxi Liu

Visual localization to compute 6DoF camera pose from a given image has wide applications such as in robotics, virtual reality, augmented reality, etc. Two kinds of descriptors are important for the visual localization. One is global descriptors that extract the whole feature from each image. The other is local descriptors that extract the local feature from each image patch usually enclosing a key point. More and more methods of the visual localization have two stages: at first to perform image retrieval by global descriptors and then from the retrieval feedback to make 2D-3D point correspondences by local descriptors. The two stages are in serial for most of the methods. This simple combination has not achieved superiority of fusing local and global descriptors. The 3D points obtained from the retrieval feedback are as the nearest neighbor candidates of the 2D image points only by global descriptors. Each of the 2D image points is also called a query local feature when performing the 2D-3D point correspondences. In this paper, we propose a novel parallel search framework, which leverages advantages of both local and global descriptors to get nearest neighbor candidates of a query local feature. Specifically, besides using deep learning based global descriptors, we also utilize local descriptors to construct random tree structures for obtaining nearest neighbor candidates of the query local feature. We propose a new probabilistic model and a new deep learning based local descriptor when constructing the random trees. A weighted Hamming regularization term to keep discriminativeness after binarization is given in the loss function for the proposed local descriptor. The loss function co-trains both real and binary descriptors of which the results are integrated into the random trees.

中文翻译:

利用本地和全局描述符与搜索对应关系并行进行视觉定位

从给定图像计算 6DoF 相机姿态的视觉定位具有广泛的应用,例如机器人、虚拟现实、增强现实等。两种描述符对于视觉定位很重要。一种是全局描述符,从每张图像中提取整个特征。另一种是局部描述符,从每个图像块中提取局部特征,通常包含一个关键点。越来越多的视觉定位方法有两个阶段:首先通过全局描述符进行图像检索,然后从检索反馈中通过局部描述符进行2D-3D点对应。对于大多数方法,这两个阶段是连续的。这种简单的组合并没有达到融合局部和全局描述符的优势。从检索反馈中获得的 3D 点仅通过全局描述符作为 2D 图像点的最近邻候选。在执行 2D-3D 点对应时,每个 2D 图像点也称为查询局部特征。在本文中,我们提出了一种新颖的并行搜索框架,它利用局部和全局描述符的优势来获得查询局部特征的最近邻候选者。具体来说,除了使用基于深度学习的全局描述符之外,我们还利用局部描述符来构造随机树结构以获得查询局部特征的最近邻候选。我们在构建随机树时提出了一个新的概率模型和一个新的基于深度学习的局部描述符。在提议的局部描述符的损失函数中给出了加权汉明正则化项,以在二值化后保持判别性。损失函数共同训练实数和二元描述符,其结果被集成到随机树中。
更新日期:2020-09-24
down
wechat
bug