Skip to main content
Log in

A comprehensive overview of dynamic visual SLAM and deep learning: concepts, methods and challenges

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The visual SLAM (vSLAM) is a research topic that has been developing rapidly in recent years, especially with the renewed interest in machine learning and, more particularly, deep-learning-based approaches. Nowadays, main research is carried out to improve accuracy and robustness in complex and dynamic environments. This scorching topic has reached a significant level of maturity. This paper presents a relatively detailed and easily understood survey of vSLAM within deep learning. This study attempts to meet this challenge by better organizing the literature, explaining the basic concepts and tools, and presenting the current trends. The contributions of this study can be summarized in three essential steps. The first one is to provide the state-of-the-art in an incremental way following the classical processes of vSLAM-based systems. The second is to give our short- and medium-term view of the development of this very active and evolving field. Finally, we share our opinions on this subject and its interactions with new trends and, more particularly, the deep learning paradigm. We believe that this contribution will be an overview and, more importantly, a critical and detailed vision that serves as a roadmap in the field of vSLAMs both in terms of models and concepts and in terms of associated technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Wolf, D., Sukhatme, G.S.: Online simultaneous localization and mapping in dynamic environments. In IEEE International Conference on Robotics and Automation, Proceedings. ICRA’04. 2004, vol. 2, pp. 1301–1307. IEEE (2004)

  2. Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part i. IEEE Robot. Autom. Magaz. 13(2), 99–110 (2006)

    Article  Google Scholar 

  3. Wen, S., Zhao, Y., Yuan, X., Wang, Z., Zhang, D., Manfredi, L.: Path planning for active slam based on deep reinforcement learning under unknown environments. Intell. Serv. Robot. 13, 1–10 (2020)

    Article  Google Scholar 

  4. Kegeleirs, M., Grisetti, G., Birattari, M.: Swarm slam: Challenges and perspectives. Front. Robot. AI 8, 23 (2021)

    Article  Google Scholar 

  5. Smith, R., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in robotics. In: Autonomous Robot Vehicles, pp. 167–193. Springer (1990)

  6. Leonard, J.J., Durrant-Whyte, H.F.: Simultaneous map building and localization for an autonomous mobile robot. In: IROS, vol. 3, pp. 1442–1447 (1991)

  7. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: Real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  8. Yang, Z., Shen, S.: Monocular visual-inertial state estimation with online initialization and camera-imu extrinsic calibration. IEEE Trans. Autom. Sci. Eng. 14(1), 39–51 (2017)

    Article  Google Scholar 

  9. Qin, T., Shen, S.: Robust initialization of monocular visual-inertial estimation on aerial robots. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4225–4232 (2017)

  10. Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: An accurate open-source library for visual, visual-inertial and multi-map slam. arXiv preprint arXiv:2007.11898 (2020)

  11. Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference (2002)

  12. Klein, G., Murray, D.: Parallel tracking and mapping on a camera phone. In: 2009 8th IEEE International Symposium on Mixed and Augmented Reality, pp. 83–86 (2009)

  13. Boucher, M., Ababsa, F., Mallem, M.: On depth usage for a lightened visual slam in small environments. Proc. Comput. Sci. 39, 28–34 (2014)

    Article  Google Scholar 

  14. Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)

    Article  Google Scholar 

  15. Younes, G., Asmar, D., Shammas, E.: A survey on non-filter-based monocular visual slam systems. arXiv preprint arXiv:1607.00470, 413:414 (2016)

  16. Taketomi, T., Uchiyama, H., Ikeda, S.: Visual slam algorithms: a survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 9(1), 16 (2017)

    Article  Google Scholar 

  17. Huang, B., Zhao, J., Liu, J.: A survey of simultaneous localization and mapping. arXiv preprint arXiv:1909.05214 (2019)

  18. Xia, L., Cui, J., Shen, R., Xun, X., Gao, Y., Li, X.: A survey of image semantics-based visual simultaneous localization and mapping: Application-oriented solutions to autonomous navigation of mobile robots. Int. J. Adv. Robot. Syst. 17(3), 1729881420919185 (2020)

    Article  Google Scholar 

  19. Zhong, F., Wang, S., Zhang, Z., Wang, Y.: Detect-slam: making object detection and slam mutually beneficial. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1001–1010. IEEE (2018)

  20. Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174. IEEE (2018)

  21. Bescos, B., Fácil, J.M., Civera, J., Neira, J.: Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)

    Article  Google Scholar 

  22. Se, S., Lowe, D., Little, J.: Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. Int. J. Robot. Res. 21, 735–760 (2002)

    Article  Google Scholar 

  23. Harltey, A., Zisserman, A.: Multiple view geometry in computer vision (2. ed.). 01 (2006)

  24. Nister, D.: An eifficient solution to the five-point relative pose problem. Proc. CVPR 2, 756–777 (2003)

    Google Scholar 

  25. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment-a modern synthesis. In: International Workshop on Vision Algorithms, pp. 298–372. Springer (1999)

  26. Engels, C., Stewénius, H., Nistér, D.: Bundle adjustment rules. Photogram. Comput. Vis., 2(32), (2006)

  27. Jurić, A., Kendeš, F., Marković, I., Petrović, I.: A comparison of graph optimization approaches for pose estimation in slam. In: 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), pp. 1113–1118. IEEE (2021)

  28. Nister, D., Naroditsky, O., Bergen, J.: Visual odometry. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004., vol. 1, pp. I–I (2004)

  29. Raguram, R., Frahm, J.M.. Pollefeys, M.: A comparative analysis of ransac techniques leading to adaptive real-time random sample consensus. In: European Conference on Computer Vision, pp. 500–513. Springer (2008)

  30. Qin, T., Li, P., Shen, S.: Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34(4), 1004–1020 (2018)

    Article  Google Scholar 

  31. Qin, T., Shen, S.: Online temporal calibration for monocular visual-inertial systems. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3662–3669. IEEE (2018)

  32. Liu, H., Chen, M., Zhang, G., Bao, H., Bao,Y.: Ice-ba: Incremental, consistent and efficient bundle adjustment for visual-inertial slam. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1974–1982 (2018)

  33. Schneider, T., Dymczyk, M., Fehr, M., Egger, K., Lynen, S., Gilitschenski, I., Siegwart, R.: Maplab: An open framework for research in visual-inertial mapping and localization. IEEE Robot. Autom. Lett. 3, 1–1 (2018)

    Article  Google Scholar 

  34. Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34(3), 314–334 (2015)

    Article  Google Scholar 

  35. Martinelli, A.: Closed-form solution to cooperative visual-inertial structure from motion (2018). arXiv preprint arXiv:1802.08515

  36. Kaiser, J., Martinelli, A., Fontana, F., Scaramuzza, D.: Simultaneous state initialization and gyroscope bias calibration in visual inertial aided navigation. IEEE Robot. Autom. Lett. 2(1), 18–25 (2017)

    Article  Google Scholar 

  37. Martinelli, A., Siegwart, R.: Vision and imu data fusion: Closed-form determination of the absolute scale, speed and attitude (2012)

  38. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2014)

  39. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)

  40. Ren, S., He, K., Girshick, R., Jian, S.: Towards real-time object detection with region proposal networks, Faster r-cnn (2016)

  41. Yang, S., Song, Y., Kaess, M., Scherer, S.: Pop-up slam: Semantic monocular plane slam for low-texture environments. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1222–1229. IEEE (2016)

  42. Yang, S., Scherer, S.: Cubeslam: Monocular 3-d object slam. IEEE Trans. Robot. 35(4), 925–938 (2019)

    Article  Google Scholar 

  43. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2018)

  44. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement (2018)

  45. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  46. Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019)

  47. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietikäinen, M.: Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 128(2), 261–318 (2020)

    Article  MATH  Google Scholar 

  48. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

  49. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs (2016)

  50. Badrinarayanan, V., Kendall, A., Cipolla, R.: A deep convolutional encoder-decoder architecture for image segmentation, Segnet (2016)

  51. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation (2018)

  52. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 801–818 (2018)

  53. Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: A survey. IEEE Trans. Pattern Anal. Mach. Intell., Image Segment. Deep Learn. (2021)

  54. Hui, T.-W., Tang, X., Change, C., Liteflownet, L.: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)

  55. Sun, D., Yang, X., Liu, M.-Y., Kautz, J.: Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)

  56. Teed, Z., Deng, J.: Raft: Recurrent all-pairs field transforms for optical flow. In: European Conference on Computer Vision, pp. 402–419. Springer (2020)

  57. Klein, G., Murray,D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234, (2007)

  58. Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  59. Mur-Artal, R., Tardós, J.D.: Fast relocalisation and loop closing in keyframe-based slam. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 846–853. IEEE, (2014)

  60. Mur-Artal. R., Tardós, J.D.: Orb-slam: tracking and mapping recognizable features. In: Workshop on Multi View Geometry in Robotics (MVIGRO)-RSS, vol. 2014, p. 2 (2014)

  61. Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)

    Article  Google Scholar 

  62. Sumikura, S., Shibuya, M., Sakurada, K.: Openvslam: A versatile visual slam framework. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2292–2295 (2019)

  63. Munoz-Salinas, R., Medina-CarnicerL Ucoslam, R.: Simultaneous localization and mapping by fusion of keypoints and squared planar markers. Pattern Recog 101, 107193 (2020)

    Article  Google Scholar 

  64. Pfrommer, B., Daniilidis, K.: Tagslam: Robust slam with fiducial markers (2019)

  65. Schlegel, D., Colosi, M., Grisetti,G.: Proslam: Graph slam from a programmer’s perspective. In: 2018 IEEE international conference on robotics and automation (ICRA), pp. 1–9. IEEE (2018)

  66. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: Dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp. 2320–2327 (2011)

  67. Forster, C., Pizzoli, M., Scaramuzza, D.: Svo: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)

  68. Forster, C., Zhang, Z., Gassner, M., Werlberger, M., Scaramuzza, D.: Svo: Semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Robot. 33(2), 249–265 (2017)

    Article  Google Scholar 

  69. Engel, J., Stúckler, J., Cremers, D.: Large-scale direct slam with stereo cameras. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1935–1942 (2015)

  70. Caruso, D., Engel, J., Cremers, D.: Large-scale direct slam for omnidirectional cameras. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 141–148 (2015)

  71. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)

    Article  Google Scholar 

  72. Matsuki, H., von Stumberg, L., Usenko, V., Stückler, J., Cremers, D.: Omnidirectional dso: Direct sparse odometry with fisheye cameras. IEEE Robot. Autom. Lett. 3(4), 3693–3700 (2018)

    Article  Google Scholar 

  73. Wang, R., Schworer, M., Cremers, D.: Stereo dso: Large-scale direct sparse visual odometry with stereo cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3903–3911 (2017)

  74. Gao, X., Wang, R., Demmel, N., Daniel, C.: LDSO: Direct sparse odometry with loop closure. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 2198–2204 (2018)

  75. Gálvez-López, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)

    Article  Google Scholar 

  76. Bloesch, M., Omari, S., Hutter, M., Siegwart, R.: Robust visual inertial odometry using a direct ekf-based approach. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 298–304. IEEE (2015)

  77. Sun, K., Mohta, K., Pfrommer, B., Watterson, M., Liu, S., Mulgaonkar, Y., Taylor, C.J., Kumar, V.: Robust stereo visual inertial odometry for fast autonomous flight. IEEE Robot. Autom. Lett. 3(2), 965–972 (2018)

    Article  Google Scholar 

  78. Qin, T., Pan, J., Cao, S., Shen, S.: A general optimization-based framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:1901.03638 (2019)

  79. Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint kalman filter for vision-aided inertial navigation. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 3565–3572. IEEE (2007)

  80. Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D.: On-manifold preintegration for real-time visual-inertial odometry. IEEE Trans. Robot. 33(1), 1–21 (2016)

    Article  Google Scholar 

  81. Delmerico, J., Scaramuzza, D.: A benchmark comparison of monocular visual-inertial odometry algorithms for flying robots. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2502–2509. IEEE (2018)

  82. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: Learned invariant feature transform. In: European Conference on Computer Vision, pp. 467–483. Springer (2016)

  83. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)

  84. Liang, H.-J., Sanket, N.J., Fermüller, C., Aloimonos, Y.: Salientdso: Bringing attention to direct sparse odometry. IEEE Trans. Autom. Sci. Eng. 16(4), 1619–1626 (2019)

    Article  Google Scholar 

  85. Ganti, P., Waslander, S.: Network uncertainty informed semantic feature selection for visual slam. In: 2019 16th Conference on Computer and Robot Vision (CRV), pp. 121–128. IEEE (2019)

  86. Tang, J., Ericson, L., Folkesson, J., Jensfelt, P.: Gcnv2: Efficient correspondence prediction for real-time slam. IEEE Robot. Autom. Lett. 4(4), 3505–3512 (2019)

    Google Scholar 

  87. Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: Slam++: Simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)

  88. Qin, Z., Wang, J., Yan, L.: Monogrnet: A geometric reasoning network for monocular 3d object localization. In Proc. AAAI Conf. Artif. Intell. 33, 8851–8858 (2019)

    Google Scholar 

  89. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)

  90. Mohanty, V., Agrawal, S., Datta, S., Ghosh, A., Sharma, V.D., Chakravarty, D.: Deepvo: A deep learning approach for monocular visual odometry. arXiv preprint arXiv:1611.06069 (2016)

  91. Tateno, K., Tombari, F., Laina, I., Navab, N.: Cnn-slam: Real-time dense monocular slam with learned depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6243–6252 (2017)

  92. Li, R., Wang, S., Long, Z., Gu, D.: Undeepvo: Monocular visual odometry through unsupervised deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7286–7291. IEEE (2018)

  93. Frost, D., Prisacariu, V., Murray, D.: Recovering stable scale in monocular slam using object-supplemented bundle adjustment. IEEE Trans. Robot. 34(3), 736–747 (2018)

    Article  Google Scholar 

  94. Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: Codeslam-learning a compact, optimisable representation for dense visual slam. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2560–2568 (2018)

  95. Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)

  96. Yang, N., von Stumberg, L., Wang, R., Cremers, D.: D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1281–1292 (2020)

  97. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)

  98. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)

  99. Konda, K.R., Memisevic, R.: Learning visual odometry with a convolutional network. VISAPP 1, 486–490 (2015)

    Google Scholar 

  100. Costante, G., Mancini, M., Valigi, P., Ciarfuglia, T.A.: Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Robot. Autom. Lett. 1(1), 18–25 (2015)

    Article  Google Scholar 

  101. Wang, S., Clark, R., Wen, H., Trigoni, N.: Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2043–2050. IEEE (2017)

  102. Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: Sfm-net: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017)

  103. Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. arXiv preprint arXiv:1701.08376 (2017)

  104. Bowman, S.L., Atanasov, N., Daniilidis, K., Pappas, G.J.: Probabilistic data association for semantic slam. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1722–1729. IEEE (2017)

  105. Gawel, A., Del Don, C., Siegwart, R., Nieto, J., Cadena, C.: X-view: Graph-based semantic multi-view localization. IEEE Robot. Autom. Lett. 3(3), 1687–1694 (2018)

    Article  Google Scholar 

  106. Stenborg, E., Toft, C., Hammarstrand, L.: Long-term visual localization using semantically segmented images. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6484–6490. IEEE (2018)

  107. Merrill, N., Huang, G.: Lightweight unsupervised deep loop closure. arXiv preprint arXiv:1805.07703 (2018)

  108. Doherty, K., Fourie, D., Leonard, J.: Multimodal semantic slam with probabilistic data association. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 2419–2425. IEEE (2019)

  109. Wang, S., Clark, R., Wen, H., Trigoni, N.: End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int. J. Robot. Res. 37(4–5), 513–542 (2018)

    Article  Google Scholar 

  110. Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., Demon, T.B.: Depth and motion network for learning monocular stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5038–5047 (2017)

  111. Tan, W., Liu, H., Dong, Z., Zhang, G., Bao, H.: Robust monocular slam in dynamic environments. In: 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 209–218. IEEE (2013)

  112. Liu, G., Zeng, W., Feng, B., Feng, X.: Dms-slam: A general visual slam system for dynamic scenes with multiple sensors. Sensors 19(17), 3714 (2019)

    Article  Google Scholar 

  113. Liu, H., Liu, G., Tian, G., Xin, S., Ji, Z.:Visual slam based on dynamic object removal. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 596–601. IEEE (2019)

  114. Cheng, J., Wang, Z., Zhou, H., Li, L., Yao, J.: Dm-slam: A feature-based slam system for rigid dynamic scenes. ISPRS Int. J. Geo-Inform. 9(4), 202 (2020)

    Article  Google Scholar 

  115. Ai, Y.-B., Rui, T., Yang, X.-Q., He, J.-L., Fu, L., Li, J.-B., Lu, M.: Visual slam in dynamic environments based on object detection. Defence Technology (2020)

  116. Bescos, B., Campos, C., Tardós, J.D., Neira, J.: Dynaslam ii: Tightly-coupled multi-object tracking and slam. arXiv preprint arXiv:2010.07820 (2020)

  117. Ballester, I., Fontan, A., Civera, J., Strobl, K.H., Triebel, R.: Dot: dynamic object tracking for visual slam. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11705–11711. IEEE (2021)

  118. Duane, C.B.: Close-range camera calibration. Photogramm. Eng 37(8), 855–866 (1971)

    Google Scholar 

  119. Tsai, R.: A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE J. Robot. Autom. 3(4), 323–344 (1987)

    Article  Google Scholar 

  120. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)

    Article  Google Scholar 

  121. Zhang, Z., Schenk, V.: Self-maintaining camera calibration over time. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 231–236. IEEE (1997)

  122. Mendelsohn, J., Daniilidis, K.: Constrained self-calibration. In: Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), vol. 2, pp. 581–587. IEEE (1999)

  123. Malis, E., Cipolla, R.: Self-calibration of zooming cameras observing an unknown planar structure. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, vol 1, pp. 85–88. IEEE (2000)

  124. Andrews, H.C.: Boby Ray Hunt: Digital image restoration. (1977)

  125. Figueiredo, M.A.T., Nowak, R.D.: An em algorithm for wavelet-based image restoration. IEEE Trans. Image Process. 12(8), 906–916 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  126. Tai, Y., Yang, J., Liu, X., Xu, C.: Memnet: A persistent memory network for image restoration. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4539–4547 (2017)

  127. Zhang. K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938 (2017)

  128. Yan, C., Li, Z., Zhang, Y., Liu, Y., Ji, X., Zhang, Y.: Depth image denoising using nuclear norm and learning graph model. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 16(4), 1–17 (2020)

    Article  Google Scholar 

  129. Kumar, M.P., Koller, D.: Efficiently selecting regions for scene understanding. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3217–3224. IEEE (2010)

  130. Dvornik, N., Shmelkov, K., Mairal, J., Schmid, C.: Blitznet: A real-time deep network for scene understanding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4154–4162 (2017)

  131. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 418–434 (2018)

  132. Sakaridis, C., Dai, D., Van Gool, L.: Semantic foggy scene understanding with synthetic data. Int. J. Computer Vis. 126(9), 973–992 (2018)

    Article  Google Scholar 

  133. Jaritz, M., Gu, J., Su, H.: Multi-view pointnet for 3d scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)

  134. Yan, C., Shao, B., Zhao, H., Ning, R., Zhang, Y., Feng, X.: 3d room layout estimation from a single rgb image. IEEE Trans. Multimed. 22(11), 3014–3024 (2020)

    Article  Google Scholar 

  135. Zhang, T., Zhang, H., Li, Y., Nakamura, Y., Zhang, L.: Flowfusion: Dynamic dense rgb-d slam based on optical flow. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7322–7328 (2020)

  136. Liu, Y., Miura, J.: Rds-slam: Real-time dynamic slam using semantic segmentation methods. IEEE Access 9, 23772–23785 (2021)

    Article  Google Scholar 

  137. Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., Rong, Q.: A survey of deep learning-based object detection. IEEE Access 7, 128837–128868 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayman Beghdadi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beghdadi, A., Mallem, M. A comprehensive overview of dynamic visual SLAM and deep learning: concepts, methods and challenges. Machine Vision and Applications 33, 54 (2022). https://doi.org/10.1007/s00138-022-01306-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-022-01306-w

Keywords

Navigation