Skip to main content
Log in

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Indoor visual localization, i.e., 6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene, is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality. However, drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization. In this paper, based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, we propose a novel system incorporating geometric information to address issues using only pixelated images. Through the system implementation, we contribute a hierarchical structure consisting of pre-scanned images and point cloud, as well as a distilled representation of the planar-element layout extracted from the original dataset. A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset. Moreover, a global image descriptor based on the image statistic modality, called block mean, variance, and color (BMVC), was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network (CNN) descriptor. Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz S M, Szeliski R. Building rome in a day. Communications of the ACM, 2011, 54(10): 105-112. https://doi.org/10.1145/2001269.2001293.

    Article  Google Scholar 

  2. Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Transactions on Graphics, 2017, 36(4): Article No. 76a. https://doi.org/10.1145/3072959.3054739.

  3. Mur-Artal R, Tardós J D. ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. https://doi.org/10.1109/TRO.2017.2705103.

    Article  Google Scholar 

  4. Li Y, Snavely N, Huttenlocher D, Fua P. Worldwide pose estimation using 3D point clouds. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.15-29. https://doi.org/10.1007/978-3-642-33718-5_2.

  5. Zeisl B, Sattler T, Pollefeys M. Camera pose voting for large-scale image-based localization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2704-2712. https://doi.org/10.1109/ICCV.2015.310.

  6. Sattler T, Havlena M, Radenovic F, Schindler K, Pollefeys M. Hyperpoints and fine vocabularies for large-scale location recognition. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2102-2110. https://doi.org/10.1109/ICCV.2015.243.

  7. Sattler T, Leibe B, Kobbelt L. Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(9): 1744-1756. https://doi.org/10.1109/TPAMI.2016.2611662.

    Article  Google Scholar 

  8. Arandjelović R, Zisserman A. All about VLAD. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.1578-1585. https://doi.org/10.1109/CVPR.2013.207.

  9. Torii A, Arandjelović R, Sivic J, Okutomi M, Pajdla T. 24/7 place recognition by view synthesis. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1808-1817. https://doi.org/10.1109/CVPR.2015.7298790.

  10. Sattler T, Havlena M, Schindler K, Pollefeys M. Large-scale location recognition and the geometric burstiness problem. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1582-1590. https://doi.org/10.1109/CVPR.2016.175.

  11. Arandjelović R, Zisserman A. DisLocation: Scalable descriptor distinctiveness for location recognition. In Proc. the 12th Asian Conference on Computer Vision, November 2014, pp.188-204. https://doi.org/10.1007/978-3-319-16817-3_13.

  12. Taira H, Okutomi M, Sattler T, Cimpoi M, Pollefeys M, Sivic J, Pajdla T, Torii A. InLoc: Indoor visual localization with dense matching and view synthesis. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7199-7209. https://doi.org/10.1109/CVPR.2018.00752.

  13. Taira H, Rocco I, Sedlar J, Okutomi M, Sivic J, Pajdla T, Sattler T, Torii A. Is this the right place? Geometricsemantic pose verification for indoor visual localization. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27 – Nov. 2, 2019, pp.4372-4382. https://doi.org/10.1109/ICCV.2019.00447.

  14. Kendall A, Grimes M, Cipolla R. PoseNet: A convolutional network for real-time 6-DoF camera relocalization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2938-2946. https://doi.org/10.1109/ICCV.2015.336.

  15. Balntas V, Li S, Prisacariu V. RelocNet: Continuous metric learning relocalisation using neural nets. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.782-799. https://doi.org/10.1007/978-3-030-01264-9_46.

  16. Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T. D2-Net: A trainable CNN for joint description and detection of local features. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.8092-8101. https://doi.org/10.1109/CVPR.2019.00828.

  17. Sattler T, Zhou Q, Pollefeys M, Leal-Taixé L. Understanding the limitations of CNN-based absolute camera pose regression. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.3297-3307. https://doi.org/10.1109/CVPR.2019.00342.

  18. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, Jan. 2021.

  19. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90.

  20. Sarlin P E, Cadena C, Siegwart R, Dymczyk M. From coarse to fine: Robust hierarchical localization at large scale. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.12708-12717. https://doi.org/10.1109/CVPR.2019.01300.

  21. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.4510-4520. https://doi.org/10.1109/CVPR.2018.00474.

  22. Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1437-1451. https://doi.org/10.1109/TPAMI.2017.2711011.

    Article  Google Scholar 

  23. Zhang W, Kosecka J. Image based localization in urban environments. In Proc. the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission, June 2006, pp.33-40. https://doi.org/10.1109/3DPVT.2006.80.

  24. Maddern W, Pascoe G, Linegar C, Newman P. 1 year, 1000 km: The Oxford RobotCar dataset. The International Journal of Robotics Research, 2017, 36(1): 3-15. https://doi.org/10.1177/0278364916679498.

    Article  Google Scholar 

  25. Sattler T, Weyand T, Leibe B, Kobbelt L. Image retrieval for image-based localization revisited. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 72. https://doi.org/10.5244/C.26.76.

  26. Badino H, Huber D, Kanade T. Visual topometric localization. In Proc. the 2011 IEEE Intelligent Vehicles Symposium, June 2011, pp.794-799. https://doi.org/10.1109/IVS.2011.5940504.

  27. Cavallari T, Golodetz S, Lord N A, Valentin J, Di Stefano L, Torr P H. On-the-fly adaptation of regression forests for online camera relocalisation. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.218-227. https://doi.org/10.1109/CVPR.2017.31.

  28. Meng L, Chen J, Tung F, Little J J, Valentin J, De Silva C W. Backtracking regression forests for accurate camera relocalization. In Proc. the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 2017, pp.6886-6893. https://doi.org/10.1109/IROS.2017.8206611.

  29. DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: Self-supervised interest point detection and description. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, June 2018, pp.224-236. https://doi.org/10.1109/CVPRW.2018.00060.

  30. Clark R, Wang S, Markham A, Trigoni N, Wen H. VidLoc: A deep spatio-temporal model for 6-DoF video-clip relocalization. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2652-2660. https://doi.org/10.1109/CVPR.2017.284.

  31. Newcombe R A, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison A J, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: Real-time dense surface mapping and tracking. In Proc. the 10th IEEE International Symposium on Mixed and Augmented Reality, October 2011, pp.127-136. https://doi.org/10.1109/ISMAR.2011.6092378.

  32. Taguchi Y, Jian Y D, Ramalingam S, Feng C. Point-plane SLAM for hand-held 3D sensors. In Proc. the 2013 IEEE International Conference on Robotics and Automation, May 2013, pp.5182-5189. https://doi.org/10.1109/ICRA.2013.6631318.

  33. Kim P, Coltin B, Kim H J. Linear RGB-D SLAM for planar environments. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.350-366. https://doi.org/10.1007/978-3-030-01225-0_21.

  34. Shi T, Cui H, Song Z, Shen S. Dense semantic 3D map based long-term visual localization with hybrid features. arXiv:2005.10766, 2020. https://arxiv.org/abs/2005.10766, Jan. 2021.

  35. Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981, 24(6): 381-395. https://doi.org/10.1145/358669.358692.

    Article  MathSciNet  Google Scholar 

  36. Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4104-4113. https://doi.org/10.1109/CVPR.2016.445.

  37. Radwan N, Valada A, Burgard W. Vlocnet++: Deep multi-task learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 2018, 3(4): 4407-4414. https://doi.org/10.1109/LRA.2018.2869640.

    Article  Google Scholar 

  38. Schönberger J L, Pollefeys M, Geiger A, Sattler T. Semantic visual localization. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.6896-6906. https://doi.org/10.1109/CVPR.2018.00721.

  39. Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94.

    Article  Google Scholar 

  40. Liu W, Li W, Huang Y, Peng J. Image retrieval by subspace-projected color and texture features. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.2891-2895. https://doi.org/10.1109/ICIP.2017.8296811.

  41. Su Q, Huang Y, Peng J. CoLDImage: Contrast and luminance distribution for content-based image retrieval. In Proc. the 2011 International Conference on Image Analysis and Signal Processing, October 2011, pp.143-146. https://doi.org/10.1109/IASP.2011.6109015.

  42. Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape distributions. ACM Transactions on Graphics, 2002, 21(4): 807-832. https://doi.org/10.1145/571647.571648.

    Article  MathSciNet  MATH  Google Scholar 

  43. Ghanem B, Thabet A, Carlos Niebles J, Caba Heilbron F. Robust Manhattan frame estimation from a single RGB-D image. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3772-3780. https://doi.org/10.1109/CVPR.2015.7299001.

  44. Feng C, Taguchi Y, Kamat V R. Fast plane extraction in organized point clouds using agglomerative hierarchical clustering. In Proc. the 2014 IEEE International Conference on Robotics and Automation, May 31–June 7, 2014, pp.6218-6225. https://doi.org/10.1109/ICRA.2014.6907776.

  45. Chen D M, Baatz G, Köser K, Tsai S S, Vedantham R, Pylvänäinen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R. City-scale landmark identification on mobile devices. In Proc. the 2011 IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.737-744. https://doi.org/10.1109/CVPR.2011.5995610.

  46. Torii A, Sivic J, Okutomi M, Pajdla T. Visual place recognition with repetitive structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2346-2359. https://doi.org/10.1109/TPAMI.2015.2409868.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang-He Tu.

Supplementary Information

ESM 1

(PDF 910 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, HX., Peng, JL., Lu, SY. et al. ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis. J. Comput. Sci. Technol. 36, 494–507 (2021). https://doi.org/10.1007/s11390-021-1373-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-1373-1

Keywords

Navigation