Abstract
In order to navigate an unknown environment, an autonomous robot must be able to build a map of its surroundings while estimating its position at the same time. This problem is known as SLAM. We propose a SLAM system for stereo cameras which builds a map of objects in a scene. The system is based on the SLAM method S-PTAM and an object detection module. The object detection module uses Deep Learning to perform online detection and provide the 3d pose estimations of objects present in an input image, while S-PTAM estimates the camera pose in real time. The system was tested on a real world environment, achieving good object localization results.
Similar content being viewed by others
References
Bao, S.Y., Bagra, M., Chao, Y.W., Savarese, S.: Structure from motion with points, regions, and objects. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2012)
Engel, J., Stückler, J., Cremers, D.: Large-scale direct SLAM with stereo cameras. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 1935–1942. IEEE Computer Society Press (2015), https://doi.org/10.1109/IROS.2015.7353631
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Intl. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Gálvez-López, D., Salas, M., Tardós, J. D., Montiel, J.: Real-time monocular object SLAM. J. Robot. Auton. Syst. 75(PB), 435–449 (2016). https://doi.org/10.1016/j.robot.2015.08.009
Gálvez-López, D., Tardós, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012). https://doi.org/10.1109/TRO.2012.2197158
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012), https://doi.org/10.1109/CVPR.2012.6248074
Girshick, R.: Fast r-cnn. In: Intl. conf. on computer vision (ICCV), pp. 1440–1448 (2015), https://doi.org/10.1109/ICCV.2015.169
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Intl. Conf. on Computer Vision (ICCV), pp. 2980–2988. https://doi.org/10.1109/ICCV.2017.322 (2017)
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: IEEE and ACM Intl. Sym. on Mixed and Augmented Reality (ISMAR), pp. 1–10. IEEE Computer Society Press, Washington, DC (2007), https://doi.org/10.1109/ISMAR.2007.4538852
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Eur. Conf. on Computer Vision (ECCV), pp. 740–755. Springer (2014), https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: Eur. Conf. on Computer Vision (ECCV), pp. 21–37. Springer, Cham (2016), https://doi.org/10.1007/978-3-319-46448-0_2
McCormac, J., Handa, A., Leutenegger, S., Davison, A.J.: SceneNet RGB-D: can 5M synthetic images beat generic ImageNet pre-training on indoor segmentation?. In: Intl. Conf. on Computer Vision (ICCV), pp. 2697–2706 (2017), https://doi.org/10.1109/ICCV.2017.292
Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640 (2017), https://doi.org/10.1109/CVPR.2017.597
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017). https://doi.org/10.1109/TRO.2017.2705103
Ortiz, L., Cabrera, V., Goncalves, L.: Depth data error modeling of the ZED 3D vision sensor from stereolabs. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 17(1), 1–15 (2018). https://doi.org/10.5565/rev/elcvia.1084
Pham, T.T., Eich, M., Reid, I., Wyeth, G.: Geometrically consistent plane extraction for dense indoor 3D maps segmentation. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 4199–4204. IEEE (2016), https://doi.org/10.1109/IROS.2016.7759618
Pillai, S., Leonard, J.: Monocular SLAM supported object recognition. In: Robotics: Science and Systems (RSS), Rome (2015), https://doi.org/10.15607/RSS.2015.XI.034
Pire, T., Fischer, T., Castro, G., De Cristóforis, P., Civera, J., Jacobo Berlles, J.: S-PTAM: Stereo parallel tracking and mapping. J. Robot. Autonom. Syst. 93, 27–42 (2017). https://doi.org/10.1016/j.robot.2017.03.019
Pire, T., Fischer, T., Civera, J., De Cristóforis, P., Berlles, J.J.: Stereo parallel tracking and mapping for robot localization. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 1373–1378 (2015), https://doi.org/10.1109/IROS.2015.7353546
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99. Curran Associates, Inc (2015)
Rohmer, E., Singh, S.P.N., Freese, M.: V-REP: A versatile and scalable robot simulation framework. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pp. 1321–1326 (2013), https://doi.org/10.1109/IROS.2013.6696520
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Intl. Conf. on Computer Vision (ICCV), pp. 2564–2571 (2011), https://doi.org/10.1109/ICCV.2011.6126544
Salas-Moreno, R.F., Glocken, B., Kelly, P.H.J., Davison, A.J.: Dense planar SLAM. In: IEEE and ACM Intl. Sym. on Mixed and Augmented Reality (ISMAR), pp. 157–164 (2014), https://doi.org/10.1109/ISMAR.2014.6948422
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: SLAM++: Simultaneous localisation and mapping at the level of objects. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1352–1359 (2013), https://doi.org/10.1109/CVPR.2013.178
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.D.: Meaningful maps with object-oriented semantic mapping. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS). https://doi.org/10.1109/IROS.2017.8206392 (2017)
Wu, Z., Song, S., Khosla, A., Xiao, J.: 3D ShapeNets: A deep representation for volumetric shapes. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920 (2015), https://doi.org/10.1109/CVPR.2015.7298801
Xiang, Y., Kim, W., Chen, W., Ji, J., Choy, C., Su, H., Mottaghi, R., Guibas, L., Savarese, S.: Objectnet3d: A large scale database for 3d object recognition. In: Eur. Conf. on Computer Vision (ECCV), pp. 160–176. Springer (2016), https://doi.org/10.1007/978-3-319-46484-8_10
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Eur. Conf. on Computer Vision (ECCV), pp. 818–833. Springer, Cham (2014), https://doi.org/10.1007/978-3-319-10590-1_53
Acknowledgements
This work is part of the Development of a weed remotion mobile robot project at CIFASIS (CONICET-UNR). We thank to Erica Vidal for her valuable work on the system experiments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Pire, T., Corti, J. & Grinblat, G. Online Object Detection and Localization on Stereo Visual SLAM System. J Intell Robot Syst 98, 377–386 (2020). https://doi.org/10.1007/s10846-019-01074-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-019-01074-2