Recurrent Deep Network for Estimating the Pose of Real Indoor Images from Synthetic Image Sequences,Sensors

当前位置： X-MOL 学术 › Sensors › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Recurrent Deep Network for Estimating the Pose of Real Indoor Images from Synthetic Image Sequences
Sensors ( IF 3.4 ) Pub Date : 2020-09-25 , DOI: 10.3390/s20195492
Debaditya Acharya , Sesa Singha Roy , Kourosh Khoshelham , Stephan Winter

Recently, deep convolutional neural networks (CNN) have become popular for indoor visual localisation, where the networks learn to regress the camera pose from images directly. However, these approaches perform a 3D image-based reconstruction of the indoor spaces beforehand to determine camera poses, which is a challenge for large indoor spaces. Synthetic images derived from 3D indoor models have been used to eliminate the requirement of 3D reconstruction. A limitation of the approach is the low accuracy that occurs as a result of estimating the pose of each image frame independently. In this article, a visual localisation approach is proposed that exploits the spatio-temporal information from synthetic image sequences to improve localisation accuracy. A deep Bayesian recurrent CNN is fine-tuned using synthetic image sequences obtained from a building information model (BIM) to regress the pose of real image sequences. The results of the experiments indicate that the proposed approach estimates a smoother trajectory with smaller inter-frame error as compared to existing methods. The achievable accuracy with the proposed approach is 1.6 m, which is an improvement of approximately thirty per cent compared to the existing approaches. A Keras implementation can be found in our Github repository.

中文翻译：

递归深度网络用于从合成图像序列估计真实室内图像的姿势

最近，深度卷积神经网络（CNN）在室内视觉定位中变得很流行，在该网络中，网络学会直接从图像中回归相机的姿态。然而，这些方法预先对室内空间执行基于3D图像的重建以确定相机的姿势，这对于大型室内空间是一个挑战。从3D室内模型获得的合成图像已用于消除3D重建的要求。该方法的局限性在于，由于独立地估计每个图像帧的姿势而导致的低精度。在本文中，提出了一种视觉定位方法，该方法利用了来自合成图像序列的时空信息来提高定位精度。使用从建筑物信息模型（BIM）获得的合成图像序列对深贝叶斯递归CNN进行微调，以回归真实图像序列的姿态。实验结果表明，与现有方法相比，所提出的方法估计的轨迹更平滑，帧间误差更小。所提出的方法可达到的精度为1.6 m，与现有方法相比，提高了约30％。可以在我们的Github存储库中找到Keras实现。与现有方法相比，大约提高了30％。可以在我们的Github存储库中找到Keras实现。与现有方法相比，大约提高了30％。可以在我们的Github存储库中找到Keras实现。

更新日期：2020-09-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11