当前位置:
X-MOL 学术
›
arXiv.cs.CV
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging Local and Global Descriptors in Parallel to Search Correspondences for Visual Localization
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10891 Pengju Zhang, Yihong Wu, Bingxi Liu
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10891 Pengju Zhang, Yihong Wu, Bingxi Liu
Visual localization to compute 6DoF camera pose from a given image has wide
applications such as in robotics, virtual reality, augmented reality, etc. Two
kinds of descriptors are important for the visual localization. One is global
descriptors that extract the whole feature from each image. The other is local
descriptors that extract the local feature from each image patch usually
enclosing a key point. More and more methods of the visual localization have
two stages: at first to perform image retrieval by global descriptors and then
from the retrieval feedback to make 2D-3D point correspondences by local
descriptors. The two stages are in serial for most of the methods. This simple
combination has not achieved superiority of fusing local and global
descriptors. The 3D points obtained from the retrieval feedback are as the
nearest neighbor candidates of the 2D image points only by global descriptors.
Each of the 2D image points is also called a query local feature when
performing the 2D-3D point correspondences. In this paper, we propose a novel
parallel search framework, which leverages advantages of both local and global
descriptors to get nearest neighbor candidates of a query local feature.
Specifically, besides using deep learning based global descriptors, we also
utilize local descriptors to construct random tree structures for obtaining
nearest neighbor candidates of the query local feature. We propose a new
probabilistic model and a new deep learning based local descriptor when
constructing the random trees. A weighted Hamming regularization term to keep
discriminativeness after binarization is given in the loss function for the
proposed local descriptor. The loss function co-trains both real and binary
descriptors of which the results are integrated into the random trees.
中文翻译:
利用本地和全局描述符与搜索对应关系并行进行视觉定位
从给定图像计算 6DoF 相机姿态的视觉定位具有广泛的应用,例如机器人、虚拟现实、增强现实等。两种描述符对于视觉定位很重要。一种是全局描述符,从每张图像中提取整个特征。另一种是局部描述符,从每个图像块中提取局部特征,通常包含一个关键点。越来越多的视觉定位方法有两个阶段:首先通过全局描述符进行图像检索,然后从检索反馈中通过局部描述符进行2D-3D点对应。对于大多数方法,这两个阶段是连续的。这种简单的组合并没有达到融合局部和全局描述符的优势。从检索反馈中获得的 3D 点仅通过全局描述符作为 2D 图像点的最近邻候选。在执行 2D-3D 点对应时,每个 2D 图像点也称为查询局部特征。在本文中,我们提出了一种新颖的并行搜索框架,它利用局部和全局描述符的优势来获得查询局部特征的最近邻候选者。具体来说,除了使用基于深度学习的全局描述符之外,我们还利用局部描述符来构造随机树结构以获得查询局部特征的最近邻候选。我们在构建随机树时提出了一个新的概率模型和一个新的基于深度学习的局部描述符。在提议的局部描述符的损失函数中给出了加权汉明正则化项,以在二值化后保持判别性。损失函数共同训练实数和二元描述符,其结果被集成到随机树中。
更新日期:2020-09-24
中文翻译:
利用本地和全局描述符与搜索对应关系并行进行视觉定位
从给定图像计算 6DoF 相机姿态的视觉定位具有广泛的应用,例如机器人、虚拟现实、增强现实等。两种描述符对于视觉定位很重要。一种是全局描述符,从每张图像中提取整个特征。另一种是局部描述符,从每个图像块中提取局部特征,通常包含一个关键点。越来越多的视觉定位方法有两个阶段:首先通过全局描述符进行图像检索,然后从检索反馈中通过局部描述符进行2D-3D点对应。对于大多数方法,这两个阶段是连续的。这种简单的组合并没有达到融合局部和全局描述符的优势。从检索反馈中获得的 3D 点仅通过全局描述符作为 2D 图像点的最近邻候选。在执行 2D-3D 点对应时,每个 2D 图像点也称为查询局部特征。在本文中,我们提出了一种新颖的并行搜索框架,它利用局部和全局描述符的优势来获得查询局部特征的最近邻候选者。具体来说,除了使用基于深度学习的全局描述符之外,我们还利用局部描述符来构造随机树结构以获得查询局部特征的最近邻候选。我们在构建随机树时提出了一个新的概率模型和一个新的基于深度学习的局部描述符。在提议的局部描述符的损失函数中给出了加权汉明正则化项,以在二值化后保持判别性。损失函数共同训练实数和二元描述符,其结果被集成到随机树中。