当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Camera pose estimation in multi-view environments: From virtual scenarios to the real world
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-04-17 , DOI: 10.1016/j.imavis.2021.104182
Jorge L. Charco , Angel D. Sappa , Boris X. Vintimilla , Henry O. Velesaca

This paper presents a domain adaptation strategy to efficiently train network architectures for estimating the relative camera pose in multi-view scenarios. The network architectures are fed by a pair of simultaneously acquired images, hence in order to improve the accuracy of the solutions, and due to the lack of large datasets with pairs of overlapped images, a domain adaptation strategy is proposed. The domain adaptation strategy consists on transferring the knowledge learned from synthetic images to real-world scenarios. For this, the networks are firstly trained using pairs of synthetic images, which are captured at the same time by a pair of cameras in a virtual environment; and then, the learned weights of the networks are transferred to the real-world case, where the networks are retrained with a few real images. Different virtual 3D scenarios are generated to evaluate the relationship between the accuracy on the result and the similarity between virtual and real scenarios—similarity on both geometry of the objects contained in the scene as well as relative pose between camera and objects in the scene. Experimental results and comparisons are provided showing that the accuracy of all the evaluated networks for estimating the camera pose improves when the proposed domain adaptation strategy is used, highlighting the importance on the similarity between virtual-real scenarios.



中文翻译:

多视图环境中的相机姿态估计:从虚拟场景到现实世界

本文提出了一种域自适应策略,可以有效地训练网络体系结构,以估计多视图场景中的相对摄像机姿态。网络体系结构由一对同时获取的图像提供,因此,为了提高解决方案的准确性,并且由于缺少具有成对重叠图像的大型数据集,提出了一种域自适应策略。领域适应策略包括将从合成图像中学到的知识转移到现实世界中。为此,首先使用成对的合成图像对网络进行训练,这些成对的图像由虚拟环境中的一对摄像机同时捕获;然后,将学习到的网络权重转移到实际案例中,在该案例中使用一些真实图像对网络进行重新训练。生成了不同的虚拟3D场景,以评估结果的准确性与虚拟场景和真实场景之间的相似性之间的关系-场景中所包含对象的几何形状上的相似性以及相机和场景中对象之间的相对姿势。提供的实验结果和比较结果表明,使用建议的域自适应策略时,用于评估摄像机姿态的所有评估网络的准确性都会提高,从而突出了虚拟现实场景之间相似性的重要性。

更新日期:2021-04-22
down
wechat
bug