ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Wang, Hui-Xuan; Peng, Jing-Liang; Lu, Shi-Yi; Cao, Xin; Qin, Xue-Ying; Tu, Chang-He

doi:10.1007/s11390-021-1373-1

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Regular Paper
Published: 31 May 2021

Volume 36, pages 494–507, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Hui-Xuan Wang¹,
Jing-Liang Peng²,
Shi-Yi Lu³,
Xin Cao³,
Xue-Ying Qin³ &
…
Chang-He Tu¹

244 Accesses
3 Citations
Explore all metrics

Abstract

Indoor visual localization, i.e., 6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene, is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality. However, drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization. In this paper, based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, we propose a novel system incorporating geometric information to address issues using only pixelated images. Through the system implementation, we contribute a hierarchical structure consisting of pre-scanned images and point cloud, as well as a distilled representation of the planar-element layout extracted from the original dataset. A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset. Moreover, a global image descriptor based on the image statistic modality, called block mean, variance, and color (BMVC), was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network (CNN) descriptor. Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Visual Localization Through Virtual Views

EfiLoc: large-scale visual indoor localization with efficient correlation between sparse features and 3D points

Article 05 September 2021

Ning Li & Haojun Ai

Map-Free Visual Relocalization: Metric Pose Relative to a Single Image

References

Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz S M, Szeliski R. Building rome in a day. Communications of the ACM, 2011, 54(10): 105-112. https://doi.org/10.1145/2001269.2001293.
Article Google Scholar
Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Transactions on Graphics, 2017, 36(4): Article No. 76a. https://doi.org/10.1145/3072959.3054739.
Mur-Artal R, Tardós J D. ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. https://doi.org/10.1109/TRO.2017.2705103.
Article Google Scholar
Li Y, Snavely N, Huttenlocher D, Fua P. Worldwide pose estimation using 3D point clouds. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.15-29. https://doi.org/10.1007/978-3-642-33718-5_2.
Zeisl B, Sattler T, Pollefeys M. Camera pose voting for large-scale image-based localization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2704-2712. https://doi.org/10.1109/ICCV.2015.310.
Sattler T, Havlena M, Radenovic F, Schindler K, Pollefeys M. Hyperpoints and fine vocabularies for large-scale location recognition. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2102-2110. https://doi.org/10.1109/ICCV.2015.243.
Sattler T, Leibe B, Kobbelt L. Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(9): 1744-1756. https://doi.org/10.1109/TPAMI.2016.2611662.
Article Google Scholar
Arandjelović R, Zisserman A. All about VLAD. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.1578-1585. https://doi.org/10.1109/CVPR.2013.207.
Torii A, Arandjelović R, Sivic J, Okutomi M, Pajdla T. 24/7 place recognition by view synthesis. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1808-1817. https://doi.org/10.1109/CVPR.2015.7298790.
Sattler T, Havlena M, Schindler K, Pollefeys M. Large-scale location recognition and the geometric burstiness problem. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1582-1590. https://doi.org/10.1109/CVPR.2016.175.
Arandjelović R, Zisserman A. DisLocation: Scalable descriptor distinctiveness for location recognition. In Proc. the 12th Asian Conference on Computer Vision, November 2014, pp.188-204. https://doi.org/10.1007/978-3-319-16817-3_13.
Taira H, Okutomi M, Sattler T, Cimpoi M, Pollefeys M, Sivic J, Pajdla T, Torii A. InLoc: Indoor visual localization with dense matching and view synthesis. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7199-7209. https://doi.org/10.1109/CVPR.2018.00752.
Taira H, Rocco I, Sedlar J, Okutomi M, Sivic J, Pajdla T, Sattler T, Torii A. Is this the right place? Geometricsemantic pose verification for indoor visual localization. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27 – Nov. 2, 2019, pp.4372-4382. https://doi.org/10.1109/ICCV.2019.00447.
Kendall A, Grimes M, Cipolla R. PoseNet: A convolutional network for real-time 6-DoF camera relocalization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2938-2946. https://doi.org/10.1109/ICCV.2015.336.
Balntas V, Li S, Prisacariu V. RelocNet: Continuous metric learning relocalisation using neural nets. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.782-799. https://doi.org/10.1007/978-3-030-01264-9_46.
Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T. D2-Net: A trainable CNN for joint description and detection of local features. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.8092-8101. https://doi.org/10.1109/CVPR.2019.00828.
Sattler T, Zhou Q, Pollefeys M, Leal-Taixé L. Understanding the limitations of CNN-based absolute camera pose regression. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.3297-3307. https://doi.org/10.1109/CVPR.2019.00342.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, Jan. 2021.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90.
Sarlin P E, Cadena C, Siegwart R, Dymczyk M. From coarse to fine: Robust hierarchical localization at large scale. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.12708-12717. https://doi.org/10.1109/CVPR.2019.01300.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.4510-4520. https://doi.org/10.1109/CVPR.2018.00474.
Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1437-1451. https://doi.org/10.1109/TPAMI.2017.2711011.
Article Google Scholar
Zhang W, Kosecka J. Image based localization in urban environments. In Proc. the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission, June 2006, pp.33-40. https://doi.org/10.1109/3DPVT.2006.80.
Maddern W, Pascoe G, Linegar C, Newman P. 1 year, 1000 km: The Oxford RobotCar dataset. The International Journal of Robotics Research, 2017, 36(1): 3-15. https://doi.org/10.1177/0278364916679498.
Article Google Scholar
Sattler T, Weyand T, Leibe B, Kobbelt L. Image retrieval for image-based localization revisited. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 72. https://doi.org/10.5244/C.26.76.
Badino H, Huber D, Kanade T. Visual topometric localization. In Proc. the 2011 IEEE Intelligent Vehicles Symposium, June 2011, pp.794-799. https://doi.org/10.1109/IVS.2011.5940504.
Cavallari T, Golodetz S, Lord N A, Valentin J, Di Stefano L, Torr P H. On-the-fly adaptation of regression forests for online camera relocalisation. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.218-227. https://doi.org/10.1109/CVPR.2017.31.
Meng L, Chen J, Tung F, Little J J, Valentin J, De Silva C W. Backtracking regression forests for accurate camera relocalization. In Proc. the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 2017, pp.6886-6893. https://doi.org/10.1109/IROS.2017.8206611.
DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: Self-supervised interest point detection and description. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, June 2018, pp.224-236. https://doi.org/10.1109/CVPRW.2018.00060.
Clark R, Wang S, Markham A, Trigoni N, Wen H. VidLoc: A deep spatio-temporal model for 6-DoF video-clip relocalization. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2652-2660. https://doi.org/10.1109/CVPR.2017.284.
Newcombe R A, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison A J, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: Real-time dense surface mapping and tracking. In Proc. the 10th IEEE International Symposium on Mixed and Augmented Reality, October 2011, pp.127-136. https://doi.org/10.1109/ISMAR.2011.6092378.
Taguchi Y, Jian Y D, Ramalingam S, Feng C. Point-plane SLAM for hand-held 3D sensors. In Proc. the 2013 IEEE International Conference on Robotics and Automation, May 2013, pp.5182-5189. https://doi.org/10.1109/ICRA.2013.6631318.
Kim P, Coltin B, Kim H J. Linear RGB-D SLAM for planar environments. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.350-366. https://doi.org/10.1007/978-3-030-01225-0_21.
Shi T, Cui H, Song Z, Shen S. Dense semantic 3D map based long-term visual localization with hybrid features. arXiv:2005.10766, 2020. https://arxiv.org/abs/2005.10766, Jan. 2021.
Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981, 24(6): 381-395. https://doi.org/10.1145/358669.358692.
Article MathSciNet Google Scholar
Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4104-4113. https://doi.org/10.1109/CVPR.2016.445.
Radwan N, Valada A, Burgard W. Vlocnet++: Deep multi-task learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 2018, 3(4): 4407-4414. https://doi.org/10.1109/LRA.2018.2869640.
Article Google Scholar
Schönberger J L, Pollefeys M, Geiger A, Sattler T. Semantic visual localization. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.6896-6906. https://doi.org/10.1109/CVPR.2018.00721.
Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94.
Article Google Scholar
Liu W, Li W, Huang Y, Peng J. Image retrieval by subspace-projected color and texture features. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.2891-2895. https://doi.org/10.1109/ICIP.2017.8296811.
Su Q, Huang Y, Peng J. CoLDImage: Contrast and luminance distribution for content-based image retrieval. In Proc. the 2011 International Conference on Image Analysis and Signal Processing, October 2011, pp.143-146. https://doi.org/10.1109/IASP.2011.6109015.
Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape distributions. ACM Transactions on Graphics, 2002, 21(4): 807-832. https://doi.org/10.1145/571647.571648.
Article MathSciNet MATH Google Scholar
Ghanem B, Thabet A, Carlos Niebles J, Caba Heilbron F. Robust Manhattan frame estimation from a single RGB-D image. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3772-3780. https://doi.org/10.1109/CVPR.2015.7299001.
Feng C, Taguchi Y, Kamat V R. Fast plane extraction in organized point clouds using agglomerative hierarchical clustering. In Proc. the 2014 IEEE International Conference on Robotics and Automation, May 31–June 7, 2014, pp.6218-6225. https://doi.org/10.1109/ICRA.2014.6907776.
Chen D M, Baatz G, Köser K, Tsai S S, Vedantham R, Pylvänäinen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R. City-scale landmark identification on mobile devices. In Proc. the 2011 IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.737-744. https://doi.org/10.1109/CVPR.2011.5995610.
Torii A, Sivic J, Okutomi M, Pajdla T. Visual place recognition with repetitive structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2346-2359. https://doi.org/10.1109/TPAMI.2015.2409868.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
Hui-Xuan Wang & Chang-He Tu
School of Information Science and Engineering, University of Jinan, Jinan, 250022, China
Jing-Liang Peng
School of Software, Shandong University, Jinan, 250101, China
Shi-Yi Lu, Xin Cao & Xue-Ying Qin

Authors

Hui-Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Liang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Shi-Yi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xue-Ying Qin
View author publications
You can also search for this author in PubMed Google Scholar
Chang-He Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chang-He Tu.

Supplementary Information

ESM 1

(PDF 910 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, HX., Peng, JL., Lu, SY. et al. ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis. J. Comput. Sci. Technol. 36, 494–507 (2021). https://doi.org/10.1007/s11390-021-1373-1

Download citation

Received: 15 February 2021
Accepted: 26 April 2021
Published: 31 May 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11390-021-1373-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Abstract

Access this article

Similar content being viewed by others

Visual Localization Through Virtual Views

EfiLoc: large-scale visual indoor localization with efficient correlation between sparse features and 3D points

Map-Free Visual Relocalization: Metric Pose Relative to a Single Image

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Abstract

Access this article

Similar content being viewed by others

Visual Localization Through Virtual Views

EfiLoc: large-scale visual indoor localization with efficient correlation between sparse features and 3D points

Map-Free Visual Relocalization: Metric Pose Relative to a Single Image

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation