Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks,Journal of Intelligent & Robotic Systems

当前位置： X-MOL 学术 › J. Intell. Robot. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks
Journal of Intelligent & Robotic Systems ( IF 3.1 ) Pub Date : 2021-07-08 , DOI: 10.1007/s10846-021-01439-6
Chenhao Yang ₁ , Andreas Zell ₁ , Yuyi Liu ₂

Affiliation

Learning-based visual localization has become prospective over the past decades. Since ground truth pose labels are difficult to obtain, recent methods try to learn pose estimation networks using pixel-perfect synthetic data. However, this also introduces the problem of domain bias. In this paper, we first build a Tuebingen Buildings dataset of RGB images collected by a drone in urban scenes and create a 3D model for each scene. A large number of synthetic images are generated based on these 3D models. We take advantage of image style transfer and cycle-consistent adversarial training to predict the relative camera poses of image pairs based on training over synthetic environment data. We propose a relative camera pose estimation approach to solve the continuous localization problem for autonomous navigation of unmanned systems. Unlike those existing learning-based camera pose estimation methods that train and test in a single scene, our approach successfully estimates the relative camera poses of multiple city locations with a single trained model. We use the Tuebingen Buildings and the Cambridge Landmarks datasets to evaluate the performance of our approach in a single scene and across-scenes. For each dataset, we compare the performance between real images and synthetic images trained models. We also test our model in the indoor dataset 7Scenes to demonstrate its generalization ability.

中文翻译：

通过循环一致对抗网络使用具有域适应的合成数据进行相对摄像机姿态估计

在过去的几十年里，基于学习的视觉定位已经成为有前景的。由于难以获得真实姿势标签，最近的方法尝试使用像素完美的合成数据来学习姿势估计网络。然而，这也引入了领域偏差的问题。在本文中，我们首先建造了一个Tuebingen Buildings无人机在城市场景中收集的 RGB 图像数据集，并为每个场景创建 3D 模型。大量合成图像是基于这些 3D 模型生成的。我们利用图像风格转移和循环一致对抗训练，基于对合成环境数据的训练来预测图像对的相对相机姿势。我们提出了一种相对相机姿态估计方法来解决无人系统自主导航的连续定位问题。与现有的在单个场景中训练和测试的基于学习的相机姿势估计方法不同，我们的方法成功地使用单个训练模型估计了多个城市位置的相对相机姿势。我们使用图宾根大厦和剑桥地标数据集来评估我们的方法在单个场景和跨场景中的性能。对于每个数据集，我们比较真实图像和合成图像训练模型之间的性能。我们还在室内数据集7Scenes 中测试了我们的模型，以证明其泛化能力。

更新日期：2021-07-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11