Abstract
Consider the geo-localization task of finding the pose of a camera in a large 3D scene from a single image. Most existing CNN-based methods use as input textured images. We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for the geo-localization task. To do so, we consider lean images, textureless projections of a simple 3D model of a city. They only contain information related to the geometry of the scene viewed (edges, faces, and relative depth). The main contributions of this paper are: (i) to demonstrate the ability of CNNs to recover camera pose using lean images; and (ii) to provide insight into the role of geometry in the CNN learning process.
Article PDF
Similar content being viewed by others
References
Se, S.; Lowe, D.; Little, J. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research Vol. 21, No. 8, 735–758, 2002.
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Li, Y. P.; Snavely, N.; Huttenlocher, D. P. Location recognition using prioritized feature matching. In: Computer Vision — ECCV 2010. Lecture Notes in Computer Science, Vol. 6312. Daniilidis, K.; Maragos, P.; Paragios, N. Eds. Springer Berlin Heidelberg, 791–804, 2010.
Ramalingam, S.; Bouaziz, S.; Sturm, P. Pose estimation using both points and lines for geo-localization. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4716–4723, 2011.
Bansal, M.; Daniilidis, K. Geometric urban geolocalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3978–3985, 2014.
Kendall, A.; Grimes, M.; Cipolla, R. PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, 2938–2946, 2015.
Walch, F.; Hazirbas, C.; Leal-Taixé, L.; Sattler, T.; Hilsenbeck, S.; Cremers, D. Image-based localization using LSTMs for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision, 627–637, 2017.
Melekhov, I.; Ylioinas, J.; Kannala, J.; Rahtu, E. Image-based localization using hourglass networks. arXiv preprint arXiv:1703.07971, 2017.
Sattler, T.; Torii, A.; Sivic, J.; Pollefeys, M.; Taira, H.; Okutomi, M.; Pajdla, T. Are large-scale 3D models really necessary for accurate visual localization? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6175–6184, 2017.
Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In: Proceedings 9th IEEE International Conference on Computer Vision, 1470–1477, 2003.
Robertsone, D.; Cipolla, R. An Image-based system for urban navigation. In: Proceedings of the British Machine Conference, 84.1–84.10, 2004.
Hays, J.; Efros, A. A. IM2GPS: Estimating geographic information from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 1–8, 2008.
Bergamo, A.; Sinha, S. N.; Torresani, L. Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 763–770, 2013.
Zhang, W.; Kosecka, J. Image based localization in urban environments. In: Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission, 33–40, 2006.
Nister, D.; Stewenius, H. Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2161–2168, 2006.
Schindler, G.; Brown, M.; Szeliski, R. City-scale location recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–7, 2007.
Irschara, A.; Zach, C.; Frahm, J.; Bischof, H. From structure-from-motion point clouds to fast location recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2599–2606, 2009.
Sattler, T.; Leibe, B.; Kobbelt, L. Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of the International Conference on Computer Vision, 667–674, 2011.
Matei, B. C.; Vander Valk, N.; Zhu, Z.; Cheng, H.; Sawhney, H. S. Image to LIDAR matching for geotagging in urban environments. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, 413–420, 2013.
Svarm, L.; Enqvist, O.; Oskarsson, M.; Kahl, F. Accurate localization and pose estimation for large 3D models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 532–539, 2014.
Baatz, G.; Saurer, O.; Köser, K.; Pollefeys, M. Large scale visual geo-localization of images in mountainous terrain. In: Computer Vision — ECCV 2012. Lecture Notes in Computer Science, Vol. 7573. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 517–530, 2012.
Svarm, L.; Enqvist, O.; Kahl, F.; Oskarsson, M. City-scale localization for cameras with known vertical direction. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 7, 1455–1461, 2017.
Piasco, N.; Sidibé, D.; Demonceaux, C.; Gouet-Brunet, V. A survey on visual-based localization: On the benefit of heterogeneous data. Pattern Recognition Vol. 74, 90–109, 2018.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9, 2015.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Kendall, A.; Cipolla, R. Modelling uncertainty in deep learning for camera relocalization. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4762–4769, 2016.
Kendall, A.; Cipolla, R. Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6555–6564, 2017.
Berlin Partner für Wirtschaft und Technologie GmbH. Berlin 3D city model. 2016. Available at https://www.businesslocationcenter.de/en/WA/B/seite0.jsp.
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; Fei-Fei, L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211–252, 2015.
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.
OpenStreetMap Wiki contributors. OSM-3D.org.OpenStreetMap Wiki, 2018. Available at https://wiki.openstreetmap.org/w/index.php?title=OSM-3D.org&oldid=2025859.
Author information
Authors and Affiliations
Corresponding author
Additional information
Moti Kadosh obtained his M.Sc. degree with honours from the Department of Electrical Engineering in Tel-Aviv University. He is working as a machine learning researcher in a medical AI firm specializing in interpreting medical images. His main fields of interest are computer vision, machine learning, deep learning, and computer graphics.
Yael Moses is a professor in the Efi Arazi School of Computer Science at the Interdisciplinary Center, Herzliya, which she joined in 1999. She received her Ph.D. degree from the Weizmann Institute of Science in 1993. She was a post-doctoral fellow with the Robotics Group of Oxford University in 1993–1994, and at the Weizmann Institute in 1994–1997. Her research interests include multi-camera systems, visual surveillance, analyzing CrowdCam images, and music transcription from video.
Ariel Shamir is Dean of the Efi Arazi School of Computer Science at the Interdisciplinary Center in Israel. He received his Ph.D. degree in computer science in 2000 from the Hebrew University in Jerusalem, and spent two years as a postdoc at the University of Texas in Austin. Prof. Shamir has numerous publications and a number of patents, and was named one of the most highly cited researchers on the Thomson Reuters list in 2015. He has broad commercial experience consulting for various companies including Disney Research, Mitsubishi Electric, PrimeSense (now Apple), Verisk, and more. Prof. Shamir specializes in geometric modeling, computer graphics, image processing, and machine learning. He is a member of ACM SIGGRAPH, IEEE Computer, AsiaGraphics, and EuroGraphics associations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Kadosh, M., Moses, Y. & Shamir, A. On the role of geometry in geo-localization. Comp. Visual Media 7, 103–113 (2021). https://doi.org/10.1007/s41095-020-0196-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-020-0196-2