Skip to main content
Log in

Online Object Detection and Localization on Stereo Visual SLAM System

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

In order to navigate an unknown environment, an autonomous robot must be able to build a map of its surroundings while estimating its position at the same time. This problem is known as SLAM. We propose a SLAM system for stereo cameras which builds a map of objects in a scene. The system is based on the SLAM method S-PTAM and an object detection module. The object detection module uses Deep Learning to perform online detection and provide the 3d pose estimations of objects present in an input image, while S-PTAM estimates the camera pose in real time. The system was tested on a real world environment, achieving good object localization results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bao, S.Y., Bagra, M., Chao, Y.W., Savarese, S.: Structure from motion with points, regions, and objects. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2012)

  2. Engel, J., Stückler, J., Cremers, D.: Large-scale direct SLAM with stereo cameras. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 1935–1942. IEEE Computer Society Press (2015), https://doi.org/10.1109/IROS.2015.7353631

  3. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Intl. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  4. Gálvez-López, D., Salas, M., Tardós, J. D., Montiel, J.: Real-time monocular object SLAM. J. Robot. Auton. Syst. 75(PB), 435–449 (2016). https://doi.org/10.1016/j.robot.2015.08.009

    Article  Google Scholar 

  5. Gálvez-López, D., Tardós, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012). https://doi.org/10.1109/TRO.2012.2197158

    Article  Google Scholar 

  6. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012), https://doi.org/10.1109/CVPR.2012.6248074

  7. Girshick, R.: Fast r-cnn. In: Intl. conf. on computer vision (ICCV), pp. 1440–1448 (2015), https://doi.org/10.1109/ICCV.2015.169

  8. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Intl. Conf. on Computer Vision (ICCV), pp. 2980–2988. https://doi.org/10.1109/ICCV.2017.322 (2017)

  9. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: IEEE and ACM Intl. Sym. on Mixed and Augmented Reality (ISMAR), pp. 1–10. IEEE Computer Society Press, Washington, DC (2007), https://doi.org/10.1109/ISMAR.2007.4538852

  10. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Eur. Conf. on Computer Vision (ECCV), pp. 740–755. Springer (2014), https://doi.org/10.1007/978-3-319-10602-1_48

  11. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: Eur. Conf. on Computer Vision (ECCV), pp. 21–37. Springer, Cham (2016), https://doi.org/10.1007/978-3-319-46448-0_2

  12. McCormac, J., Handa, A., Leutenegger, S., Davison, A.J.: SceneNet RGB-D: can 5M synthetic images beat generic ImageNet pre-training on indoor segmentation?. In: Intl. Conf. on Computer Vision (ICCV), pp. 2697–2706 (2017), https://doi.org/10.1109/ICCV.2017.292

  13. Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640 (2017), https://doi.org/10.1109/CVPR.2017.597

  14. Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017). https://doi.org/10.1109/TRO.2017.2705103

    Article  Google Scholar 

  15. Ortiz, L., Cabrera, V., Goncalves, L.: Depth data error modeling of the ZED 3D vision sensor from stereolabs. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 17(1), 1–15 (2018). https://doi.org/10.5565/rev/elcvia.1084

    Article  Google Scholar 

  16. Pham, T.T., Eich, M., Reid, I., Wyeth, G.: Geometrically consistent plane extraction for dense indoor 3D maps segmentation. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 4199–4204. IEEE (2016), https://doi.org/10.1109/IROS.2016.7759618

  17. Pillai, S., Leonard, J.: Monocular SLAM supported object recognition. In: Robotics: Science and Systems (RSS), Rome (2015), https://doi.org/10.15607/RSS.2015.XI.034

  18. Pire, T., Fischer, T., Castro, G., De Cristóforis, P., Civera, J., Jacobo Berlles, J.: S-PTAM: Stereo parallel tracking and mapping. J. Robot. Autonom. Syst. 93, 27–42 (2017). https://doi.org/10.1016/j.robot.2017.03.019

    Article  Google Scholar 

  19. Pire, T., Fischer, T., Civera, J., De Cristóforis, P., Berlles, J.J.: Stereo parallel tracking and mapping for robot localization. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 1373–1378 (2015), https://doi.org/10.1109/IROS.2015.7353546

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99. Curran Associates, Inc (2015)

  21. Rohmer, E., Singh, S.P.N., Freese, M.: V-REP: A versatile and scalable robot simulation framework. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 1321–1326 (2013), https://doi.org/10.1109/IROS.2013.6696520

  22. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Intl. Conf. on Computer Vision (ICCV), pp. 2564–2571 (2011), https://doi.org/10.1109/ICCV.2011.6126544

  23. Salas-Moreno, R.F., Glocken, B., Kelly, P.H.J., Davison, A.J.: Dense planar SLAM. In: IEEE and ACM Intl. Sym. on Mixed and Augmented Reality (ISMAR), pp. 157–164 (2014), https://doi.org/10.1109/ISMAR.2014.6948422

  24. Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: SLAM++: Simultaneous localisation and mapping at the level of objects. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1352–1359 (2013), https://doi.org/10.1109/CVPR.2013.178

  25. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  26. Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.D.: Meaningful maps with object-oriented semantic mapping. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS). https://doi.org/10.1109/IROS.2017.8206392 (2017)

  27. Wu, Z., Song, S., Khosla, A., Xiao, J.: 3D ShapeNets: A deep representation for volumetric shapes. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920 (2015), https://doi.org/10.1109/CVPR.2015.7298801

  28. Xiang, Y., Kim, W., Chen, W., Ji, J., Choy, C., Su, H., Mottaghi, R., Guibas, L., Savarese, S.: Objectnet3d: A large scale database for 3d object recognition. In: Eur. Conf. on Computer Vision (ECCV), pp. 160–176. Springer (2016), https://doi.org/10.1007/978-3-319-46484-8_10

  29. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Eur. Conf. on Computer Vision (ECCV), pp. 818–833. Springer, Cham (2014), https://doi.org/10.1007/978-3-319-10590-1_53

Download references

Acknowledgements

This work is part of the Development of a weed remotion mobile robot project at CIFASIS (CONICET-UNR). We thank to Erica Vidal for her valuable work on the system experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taihú Pire.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(AVI 58.6 MB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pire, T., Corti, J. & Grinblat, G. Online Object Detection and Localization on Stereo Visual SLAM System. J Intell Robot Syst 98, 377–386 (2020). https://doi.org/10.1007/s10846-019-01074-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-019-01074-2

Keywords

Navigation