Abstract
A major focus of current research on place recognition is visual localization for autonomous driving. In this scenario, as cameras will be operating continuously, it is realistic to expect videos as an input to visual localization algorithms, as opposed to the single-image querying approach used in other visual localization works. In this paper, we show that exploiting temporal continuity in the testing sequence significantly improves visual localization—qualitatively and quantitatively. Although intuitive, this idea has not been fully explored in recent works. To this end, we propose two filtering approaches to exploit the temporal smoothness of image sequences: (i) filtering on discrete domain with hidden Markov model, and (ii) filtering on continuous domain with Monte Carlo-based visual localization. Our approaches rely on local features with an encoding technique to represent an image as a single vector. The experimental results on synthetic and real datasets show that our proposed methods achieve better results than state of the art (i.e., deep learning-based pose regression approaches) for the task on visual localization under significant appearance change. Our synthetic dataset and source code are made publicly available (https://sites.google.com/view/g2d-software/home; https://github.com/dadung/Visual-Localization-Filtering).
Similar content being viewed by others
Notes
In more “localized” operations such as parking, where highly accurate 6 DoF estimation is required, it is probably better to rely on the INS.
More fundamentally, the car is a nonholonomic system [1].
On uneven or hilly roads, accelerometers can be used to estimate the vertical motion; hence, VL can focus on map-scale navigation.
The method of [6] will give ambiguous results on noninformative trajectories, e.g., largely straight routes. Hence, VL is still crucial.
Based on Intel i7-6700 @ 3.40GHz, RAM 16GB, NVIDIA GeForce GTX 1080 Ti and the highest graphical configuration for GTA V.
References
Wikipedia (2020) Nonholonomic system. In: https://en.wikipedia.org/wiki/Nonholonomic_system
Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5297–5307
Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: CVPR
Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S, Rother C (2017) DSAC-differentiable RANSAC for camera localization. In: CVPR
Brahmbhatt S, Gu J, Kim K, Hays J, Kautz J (2018) Geometry-aware learning of maps for camera localization. In: CVPR
Brubaker MA, Geiger A, Urtasun R (2013) Lost! leveraging the crowd for probabilistic visual self-localization. In: CVPR
Bustos AP, Chin TJ, Eriksson A, Reid I (2019) Visual slam: Why bundle adjust? In: ICRA
Churchill W, Newman P (2013) Experience-based navigation for long-term localisation. Int J Robotics Res 32:1645
Do TT, Tran QD, Cheung NM (2015) FAemb: a function approximation-based embedding method for image retrieval. In: CVPR
Doan AD, Jawaid AM, Do TT, Chin TJ (2018) G2D: from GTA to Data. arXiv preprint arXiv:1806.07381 pp 1–9
Doan AD, Latif Y, Chin TJ, Liu Y, Do TT, Reid I (2019) Scalable place recognition under appearance change for autonomous driving. In: ICCV
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR
Jégou H, Chum O (2012) Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening. In: ECCV
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR
Jégou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: CVPR
Junkins JL, Schaub H (2009) Analytical mechanics of space systems. American Institute of Aeronautics and Astronautics, Reston
Kendall A, Cipolla R (2016) Modelling uncertainty in deep learning for camera relocalization. In: ICRA
Kendall A, Cipolla R, et al. (2017) Geometric loss functions for camera pose regression with deep learning. In: CVPR
Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: CVPR
Ko J, Fox D (2009) GP-Bayesfilters: Bayesian filtering using Gaussian process prediction and observation models. Auton Robots 27:75
Krähenbühl P (2018) Free supervision from video games. In: CVPR
Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: an accurate o(n) solution to the PnP problem. IJCV 81:155
Maddern W, Pascoe G, Linegar C, Newman P (2017) 1 year, 1000 km: the Oxford robotcar dataset. Int J Robotics Res 36:3
Markley FL, Cheng Y, Crassidis JL, Oshman Y (2007) Averaging quaternions. J Guid Control Dyn 30:1193
Menegatti E, Zoccarato M, Pagello E, Ishiguro H (2004) Image-based Monte Carlo localisation with omnidirectional images. Robotics Auton Syst 48:17
Milford MJ, Wyeth GF (2012) SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: ICRA
Murray N, Perronnin F (2014) Generalized max pooling. In: CVPR
Richter SR, Hayder Z, Koltun V (2017) Playing for benchmarks. In: ICCV
Rubino C, Del Bue A, Chin TJ (2018) Practical motion segmentation for urban street view scenes. In: ICRA
Sattler T, Leibe B, Kobbelt L (2017) Efficient & effective prioritized matching for large-scale image-based localization. TPAMI 39:1744–1756
Sattler T, Maddern W, Toft C, Torii A, Hammarstrand L, Stenborg E, Safari D, Okutomi M, Pollefeys M, Sivic J, et al. (2018) Benchmarking 6DOF outdoor visual localization in changing conditions. In: CVPR
Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: CVPR
Schönberger JL, Pollefeys M, Geiger A, Sattler T (2018) Semantic visual localization. In: CVPR
Sünderhauf N, Neubert P, Protzel P (2013) Are we there yet? challenging SeqSLAM on a 3000 km journey across all four seasons. In: ICRA workshop on long-term autonomy
Torii A, Arandjelovic R, Sivic J, Okutomi M, Pajdla T (2015) 24/7 place recognition by view synthesis. In: CVPR
Tran NT, Le Tan DK, Doan AD, Do TT, Bui TA, Tan M, Cheung NM (2019) On-device scalable image-based localization via prioritized cascade search and fast one-many ransac. TIP 28:1675
Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: CVPR workshop on autonomous driving
Walch F, Hazirbas C, Leal-Taixe L, Sattler T, Hilsenbeck S, Cremers D (2017) Image-based localization using LSTMS for structured feature correlation. In: ICCV
Wang P, Huang X, Cheng X, Zhou D, Geng Q, Yang R (2019) The ApolloScape open dataset for autonomous driving and its application. TPAMI 42:2702–2719
Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4:65
Wolf, J., Burgard, W., Burkhardt, H.: Robust vision-based localization for mobile robots using an image retrieval system based on invariant features. In: ICRA (2002)
Wolf J, Burgard W, Burkhardt H (2005) Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization. IEEE Trans Robotics 21:208
Yu K, Zhang T (2010) Improved local coordinate coding using local tangents. In: ICML
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that there is no potential conflict of interest for this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Doan, AD., Latif, Y., Chin, TJ. et al. Visual localization under appearance change: filtering approaches. Neural Comput & Applic 33, 7325–7338 (2021). https://doi.org/10.1007/s00521-020-05339-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05339-y