Skip to main content
Log in

Fine semantic mapping based on dense segmentation network

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

This paper proposes a fine semantic mapping method using dense segmentation network (DS-Net) to obtain good performance of semantic mapping fusion. First, the RGB image and the depth image are used to generate a dense indoor scene map via the state-of-the-art dense SLAM (ElasticFusion). Then, the DS-Net is constructed based on DenseNet’s dense connection to perform precise semantic segmentation on the input RGB image. Finally, the long-term correspondence is established between the indoor scene map and the landmarks using continuous frames both in the visual odometer and in loop detection, and the final semantic map is obtained by fusing the indoor scene map with the semantic predictions of the RGB-D video frames of multiple angles. Experiments were performed on the NYUv2, PASCAL VOC 2012, CIFAR10 datasets and our laboratory environments. Results show that our method can reduce the error in dense map construction and obtain good semantic segmentation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Cadena C, Carlone L, Carrillo H et al (2016) Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Rob 32(6):1309–1332

    Article  Google Scholar 

  2. McCormac J, Handa A, Davison A et al (2017) Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In: IEEE international conference on robotics & automation (ICRA), pp 4628–4635

  3. Yang S, Huang Y, Scherer S (2017) Semantic 3d occupancy mapping through efficient high order crfs. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 590–597

  4. Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from RGB-D images. In: IEEE international conference on robotics & automation (ICRA), pp 2631–2638

  5. Henry P, Krainin M, Herbst E et al (2014) RGB-D mapping: Using depth cameras for dense 3d modeling of indoor environments. Int J Robot Res 31(5):647–663

  6. Whelan T, Johannsson H, Kaess M et al (2013) Robust real-time visual odometry for dense RGB-D mapping. In: IEEE international conference on robotics & automation (ICRA), pp 5724–5731

  7. Dai A, NieSSner M, Zollhöfer M et al (2017) Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface re-integration. ACM Trans Gr 36(3):24

    Article  Google Scholar 

  8. Whelan T, Salas-Moreno RF, Glocker B et al (2016) Elasticfusion: real-time dense slam and light source estimation. Int J Robot Res 35(14):1697–1716

  9. Sunderhauf N, Pham TT, Latif Y et al (2017) Meaningful maps with object-oriented semantic mapping. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5079–5085

  10. Bowman SL, Atanasov N, Daniilidis K et al (2017) Probabilistic data association for semantic slam. In: IEEE international conference on robotics & automation (ICRA), pp 1722–1729

  11. Huang G, Liu Z, Laurens VDM et al (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  12. Silberman N, Hoiem D, Kohli P et al (2012) Indoor segmentation and support inference from RGBD images. In: European conference on computer vision (ECCV), pp 746–760

  13. Vineet V, Miksik O, Lidegaard M et al (2015) Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: IEEE international conference on robotics & automation, pp 75–82

  14. Salas-Moreno RF, Newcombe RA, Strasdat H et al (2013) Slam++: simultaneous localisation and mapping at the level of objects. In: Computer vision pattern recognition (CVPR), pp 1352–1359

  15. Nakajima Y, Tateno K, Tombari F et al (2018) Fast and accurate semantic mapping through geometric-based incremental segmentation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 385–392

  16. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: IEEE international conference on computer vision (ICCV), pp 1520–1528

  17. Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, pp 1495–1503

  18. Schuler CJ, Hirsch M, Harmeling S et al (2016) Learning to deblur. IEEE Trans Pattern Anal Mach Intell 38(7):1439–1451

  19. Shelhamer E, Long J,Darrell T (2017) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

  20. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 472–483

  21. Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

  22. Zhao H, Qi X, Shen X et al (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420

  23. Badrinarayanan V, Kendall A, Segnet Cipolla R (2017) A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

  24. Kreo I, Krapac J, šegvić S (2019) Efficient ladder-style densenets for semantic segmentation of large images. arXiv preprint arXiv:1905.05661

  25. Larsson G, Maire M, Shakhnarovich G. Fractalnet (2016) Ultra-deep neural networks without residuals. In: International conference on learning representations, pp 485–495

  26. Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Neural information processing systems (NIPS), pp 2377–2385

  27. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  28. Lee CY, Xie S (2015) Gallagher P et al. Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570

  29. Zhang Z, Liang X, Dong X et al (2018) A sparse-view CT reconstruction method based on combination of densenet and deconvolution. IEEE Trans Med Imaging 37(6):1407–1417

  30. Li T, Xu M, Yang R et al (2019) A densenet based approach for multi-frame in-loop filter in HEVC. In: Data compression conference, pp 270–279

  31. Joaquim S, Matabosch C et al (2007) A review of recent range image registration methods with accuracy evaluation. Image Vis Comput 25(5):578–596

    Article  Google Scholar 

  32. Keller M, Lefloch D, Lambers M, Izadi S, Weyrich T. Kolb A (2013) Real-time 3D Reconstruction in Dynamic Scenes using Point-based Fusion. In: Proceedings of joint 3DIM/3DPVT conference (3DV), pp 1–8

  33. Whelan T, Kaess M, Johannsson H, Fallon MF, Leonard JJ, McDonald JB (2015) Real-time large scale dense RGB-D SLAM with volumetric fusion. IJRR 34(4–5):598–626

  34. Woodham RJ et al (1992) Photometric method for determining surface orientation from multiple images. Opt Eng 19(1):139–144

    Google Scholar 

  35. Glocker J, Criminisi Shotton A, Izadi S (2015) Real-Time RGB-D camera relocalization via randomized ferns for Keyframe encoding. IEEE Trans Visual Comput Graphics 21(5):571–583

  36. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on international conference on machine learning, pp 448–456

  37. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: International conference on international conference on machine learning, pp 807–814

  38. Everingham M, Winn J (2006) The Pascal visual object classes challenge 2007 (voc2007) development kit. Int J Comput Vis 111(1):98–136

  39. Couprie C, Farabet C, Najman L et al (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572

  40. Hariharan, B et al (2011) Semantic contours from inverse detectors. In: IEEE international conference on computer vision, ICCV, Barcelona, Spain, November 6–13, pp 991–998

  41. Lin T-Y, Maire M, Belongie S, et al (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755

  42. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: CVPR, pp 6230–6239

  43. Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp 833–851

  44. Zhang H, Dana KJ, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: CVPR, pp 7151–7160

  45. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In CVPR, pp 1857–1866

  46. Amiri Atashgah MA, Malaek SMB et al (2012) Prediction of aerial-image motion blurs due to the flying vehicle dynamics and camera characteristics in a virtual environment. J Aerosp Eng 227(7):1055–1067

    Google Scholar 

  47. Amiri Atashgah MA, Gholampour P, Malaek SMB (2013) Integration of image de-blurring in an aerial Mono-SLAM. J Aerosp Eng 228(8):1348–1462

    Google Scholar 

Download references

Acknowledgements

This document is the results of the research projects funded by the National Natural Science Foundation of China (61873008) and Beijing Natural Science Foundation (4182008, 4192010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoyu Zuo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zuo, G., Zheng, T., Liu, Y. et al. Fine semantic mapping based on dense segmentation network. Intel Serv Robotics 14, 47–60 (2021). https://doi.org/10.1007/s11370-020-00341-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-020-00341-8

Keywords

Navigation