Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2020-12-23 , DOI: 10.1007/s11263-020-01399-8
Zichao Zhang ₁ , Torsten Sattler ₂ , Davide Scaramuzza ₁

Affiliation

Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality. High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM). However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/ night changes. At the same time, manually annotating feature correspondences is not scalable and potentially inaccurate. In this work, we propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features. Given an initial pose estimate, our approach iteratively refines the pose based on feature matches against a rendering of the model from the current pose estimate. We significantly improve the nighttime reference poses of the popular Aachen Day-Night dataset, showing that state-of-the-art visual localization methods perform better (up to $47\%$) than predicted by the original reference poses. We extend the dataset with new nighttime test images, provide uncertainty estimates for our new reference poses, and introduce a new evaluation criterion. We will make our reference poses and our framework publicly available upon publication.

中文翻译：

通过学习特征和视图合成生成长期视觉定位的参考姿势

视觉定位是自动驾驶和增强现实的关键支持技术之一。具有准确的 6 自由度 (DoF) 参考姿势的高质量数据集是基准测试和改进现有方法的基础。传统上，参考姿势是通过运动结构（SfM）获得的。然而，SfM 本身依赖于局部特征，当在不同条件下拍摄图像时，例如白天/夜晚的变化，这些特征很容易失败。同时，手动注释特征对应关系不可扩展并且可能不准确。在这项工作中，我们提出了一种半自动化方法，通过学习的特征，基于 3D 模型渲染和真实图像之间的特征匹配来生成参考姿势。给定初始姿势估计，我们的方法基于与当前姿势估计的模型渲染的特征匹配来迭代地细化姿势。我们显着改进了流行的亚琛日夜数据集的夜间参考姿势，表明最先进的视觉定位方法比原始参考姿势预测的性能更好（高达 47\%$）。我们用新的夜间测试图像扩展了数据集，为新的参考姿势提供不确定性估计，并引入了新的评估标准。我们将在发布后公开我们的参考姿势和框架。

更新日期：2020-12-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>