PoseGAN: A pose-to-image translation framework for camera localization,ISPRS Journal of Photogrammetry and Remote Sensing

当前位置： X-MOL 学术 › ISPRS J. Photogramm. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PoseGAN: A pose-to-image translation framework for camera localization
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2020-06-29 , DOI: 10.1016/j.isprsjprs.2020.06.010
Kanglin Liu , Qing Li , Guoping Qiu

Camera localization is a fundamental requirement in robotics and computer vision. This paper introduces a pose-to-image translation framework to tackle the camera localization problem. We present PoseGANs, a conditional generative adversarial networks (cGANs) based framework for the implementation of pose-to-image translation. PoseGANs feature a number of innovations including a distance metric based conditional discriminator to conduct camera localization and a pose estimation technique for generated camera images as a stronger constraint to improve camera localization performance. Compared with learning-based regression methods such as PoseNet, PoseGANs can achieve better performance with model sizes that are 70% smaller. In addition, PoseGANs introduce the view synthesis technique to establish the correspondence between the 2D images and the scene, i.e., given a pose, PoseGANs are able to synthesize its corresponding camera images. Furthermore, we demonstrate that PoseGANs differ in principle from structure-based localization and learning-based regressions for camera localization, and show that PoseGANs exploit the geometric structures to accomplish the camera localization task, and is therefore more stable than and superior to learning-based regressions which rely on local texture features instead. In addition to camera localization and view synthesis, we also demonstrate that PoseGANs can be successfully used for other interesting applications such as moving object elimination and frame interpolation in video sequences.

中文翻译：

PoseGAN：用于相机定位的姿势到图像翻译框架

相机本地化是机器人技术和计算机视觉的基本要求。本文介绍了一种姿势到图像的翻译框架，以解决相机的定位问题。我们提出了PoseGAN，这是一个基于条件生成对抗网络（cGAN）的框架，用于实现从姿势到图像的翻译。PoseGAN具有许多创新功能，包括基于距离度量的条件鉴别器来进行相机定位，以及用于生成的相机图像的姿态估计技术，作为提高相机定位性能的更强约束。与基于学习的回归方法（如PoseNet）相比，PoseGAN可以以较小的模型大小实现70％的性能。此外，即，给定姿势，PoseGAN能够合成其相应的相机图像。此外，我们证明了PoseGAN在原理上不同于基于结构的本地化和基于学习的相机定位回归，并且表明PoseGANs利用几何结构来完成相机的定位任务，因此比基于学习的结构更稳定并优于基于学习的结构。而是依靠局部纹理特征进行回归。除了摄像机定位和视图合成之外，我们还证明了PoseGAN可以成功用于其他有趣的应用程序，例如运动对象消除和视频序列中的帧插值。

更新日期：2020-06-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11