Abstract
Scene reconstruction and visual localization in dynamic environments such as street scenes are a challenge due to the lack of distinctive, stable keypoints. While learned convolutional features have proven to be robust to changes in viewing conditions, handcrafted features still have advantages in distinctiveness and accuracy when applied to structure from motion. For collaborative reconstruction of road sections by a car fleet, we propose to use multimodal domain adaptation as a preprocessing step to align images in their appearance and enhance keypoint matching across viewing conditions while preserving the advantages of handcrafted features. Training a generative adversarial network for translations between different illumination and weather conditions, we evaluate qualitative and quantitative aspects of domain adaptation and its impact on feature correspondences. Combined with a multi-feature discriminator, the model is optimized for synthesis of images which do not only improve feature matching but also exhibit a high visual quality. Experiments with a challenging multi-domain dataset recorded in various road scenes on multiple test drives show that our approach outperforms other traditional and learning-based methods by improving completeness or accuracy of structure from motion with multimodal two-domain image collections in eight out of ten test scenes.
Similar content being viewed by others
References
Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: IEEE International Conference on Computer Vision (ICCV) (2009)
Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M., van Gool, L.: Night-to-day image translation for retrieval-based localization. In: IEEE International Conference on Robotics and Automation (ICRA) (2019)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning (ICML) (2017)
Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Corke, P., Paul, R., Churchill, W., Newman, P.: Dealing with shadows: capturing intrinsic scene appearance for image-based outdoor localisation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2013)
Crandall, D.J., Owens, A., Snavely, N., Huttenlocher, D.P.: SfM with MRFs: discrete-continuous optimization for large-scale structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(12), 2841–2853 (2013)
Cui, H., Shen, S., Gao, W., Hu, Z.: Efficient large-scale structure from motion by fusing auxiliary imaging information. IEEE Trans. Image Process. 24(11), 3561–3573 (2015)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 29(6), 1052–1067 (2007)
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)
Dong, J., Soatto, S.: Domain-size pooling in local descriptors: DSP-SIFT. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Gaiani, M., Remondino, F., Apollonio, F., Ballabeni, A.: An advanced pre-processing pipeline to improve automated photogrammetric reconstructions of architectural scenes. Remote Sens. 8(3), 178 (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS) (2014)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Huang, X., Belongie, S.J.: Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: European Conference on Computer Vision (ECCV) (2018)
Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K., van Gool, L.: WESPE: Weakly supervised photo enhancer for digital cameras. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Johnson, J., Alahi, A., Li, F.F.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision (ECCV) (2016)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (ICLR) (2018)
Kazemi, H., Iranmanesh, S.M., Nasrabadi, N.M.: Style and content disentanglement in generative adversarial networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2019)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014)
Klingner, B., Martin, D., Roseborough, J.: Street view motion-from-structure-from-motion. In: IEEE International Conference on Computer Vision (ICCV) (2013)
Larsen, A.B.L., Sonderby, S.K., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: International Conference on Machine Learning (ICML) (2016)
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: European Conference on Computer Vision (ECCV) (2018)
Leutenegger, S., Chli, M., Siegwart, R.: BRISK: Binary robust invariant scalable keypoints. In: International Conference on Computer Vision (ICCV) (2011)
Lhuillier, M.: Fusion of GPS and structure-from-motion using constrained bundle adjustments. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems (NIPS) (2017)
Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 Year, 1000 km: the Oxford RobotCar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z.: Multi-class Generative Adversarial Networks with the L2 Loss Function. arXiv:1611.04076 (2016)
Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets. arXiv:1411.1784 (2014)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Naseer, T., Oliveira, G.L., Brox, T., Burgard, W.: Semantics-aware visual localization under challenging perceptual conditions. In: IEEE International Conference on Robotics and Automation (ICRA) (2017)
Porav, H., Maddern, W., Newman, P.: Adversarial training for adverse conditions: robust metric localisation using appearance transfer. In: IEEE International Conference on Robotics and Automation (ICRA) (2018)
Radford, A., Metz, L., Chintala, S.: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 (2015)
Riazuelo, L., Civera, J., Montiel, J.: C2TAM: A cloud framework for cooperative tracking and mapping. Robot. Auton. Syst. 62(4), 401–413 (2014)
Rosca, M., Lakshminarayanan, B., Warde-Farley, D., Mohamed, S.: Variational Approaches for Auto-Encoding Generative Adversarial Networks. arXiv:1706.04987 (2017)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV) (2011)
Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J., Kahl, F., Pajdla, T.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Schiffers, F., Yu, Z., Arguin, S., Maier, A., Ren, Q.: Synthetic fundus fluorescein angiography using deep neural networks. In: Bildverarbeitung für die Medizin. Springer Vieweg (2018)
Schoenberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative Evaluation of Hand-Crafted and Learned Local Features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV] (2014)
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)
Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free. In: Robotics: Science and Systems (RSS) (2015)
Venator, M., Bruns, E., Maier, A.: Robust camera pose estimation for unordered road scene images in varying viewing conditions. IEEE Trans. Intell. Veh. 5(1), 165–174 (2019)
Wallis, R.H.: An approach for the space variant restoration and enhancement of images. In: Symposium on Current Mathematical Problems in Image Science (1976)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: European Conference on Computer Vision (ECCV) (2016)
Widya, A.R., Torii, A., Okutomi, M.: Structure-from-motion using dense CNN features with keypoint relocalization. IPSJ Trans. Comput. Vis. Appl. 10(1) (2018)
Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision (3DV) (2013)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: Learned invariant feature transform. In: European Conference on Computer Vision (ECCV) (2016)
Ying, Z., Li, G., Ren, Y., Wang, R., Wang, W.: A new image contrast enhancement algorithm using exposure fusion framework. In: International Conference on Computer Analysis of Images and Patterns (CAIP) (2017)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 41(8), 1947–1962 (2019)
Zhao, J.J., Mathieu, M., LeCun, Y.: Energy-based Generative Adversarial Network. arXiv:1609.03126 (2016)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Zou, D., Tan, P.: CoSLAM: Collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(2), 354–366 (2013)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 30871 KB)
Rights and permissions
About this article
Cite this article
Venator, M., Aklanoglu, S., Bruns, E. et al. Enhancing collaborative road scene reconstruction with unsupervised domain alignment. Machine Vision and Applications 32, 13 (2021). https://doi.org/10.1007/s00138-020-01144-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-020-01144-8