Fine semantic mapping based on dense segmentation network

Zuo, Guoyu; Zheng, Tao; Liu, Yuelei; Xu, Zichen; Gong, Daoxiong; Yu, Jianjun

doi:10.1007/s11370-020-00341-8

Fine semantic mapping based on dense segmentation network

Original Research Paper
Published: 16 November 2020

Volume 14, pages 47–60, (2021)
Cite this article

Intelligent Service Robotics Aims and scope Submit manuscript

Guoyu Zuo ORCID: orcid.org/0000-0002-7624-4728^1,2,
Tao Zheng^1,2,
Yuelei Liu^1,2,
Zichen Xu^1,2,
Daoxiong Gong^1,2 &
…
Jianjun Yu^1,2

441 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a fine semantic mapping method using dense segmentation network (DS-Net) to obtain good performance of semantic mapping fusion. First, the RGB image and the depth image are used to generate a dense indoor scene map via the state-of-the-art dense SLAM (ElasticFusion). Then, the DS-Net is constructed based on DenseNet’s dense connection to perform precise semantic segmentation on the input RGB image. Finally, the long-term correspondence is established between the indoor scene map and the landmarks using continuous frames both in the visual odometer and in loop detection, and the final semantic map is obtained by fusing the indoor scene map with the semantic predictions of the RGB-D video frames of multiple angles. Experiments were performed on the NYUv2, PASCAL VOC 2012, CIFAR10 datasets and our laboratory environments. Results show that our method can reduce the error in dense map construction and obtain good semantic segmentation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

CIM-WV: A 2D semantic segmentation dataset of rich window view contents in high-rise, high-density Hong Kong based on photorealistic city information models

Article Open access 28 March 2024

Maosu Li, Anthony G. O. Yeh & Fan Xue

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

References

Cadena C, Carlone L, Carrillo H et al (2016) Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Rob 32(6):1309–1332
Article Google Scholar
McCormac J, Handa A, Davison A et al (2017) Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In: IEEE international conference on robotics & automation (ICRA), pp 4628–4635
Yang S, Huang Y, Scherer S (2017) Semantic 3d occupancy mapping through efficient high order crfs. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 590–597
Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from RGB-D images. In: IEEE international conference on robotics & automation (ICRA), pp 2631–2638
Henry P, Krainin M, Herbst E et al (2014) RGB-D mapping: Using depth cameras for dense 3d modeling of indoor environments. Int J Robot Res 31(5):647–663
Whelan T, Johannsson H, Kaess M et al (2013) Robust real-time visual odometry for dense RGB-D mapping. In: IEEE international conference on robotics & automation (ICRA), pp 5724–5731
Dai A, NieSSner M, Zollhöfer M et al (2017) Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface re-integration. ACM Trans Gr 36(3):24
Article Google Scholar
Whelan T, Salas-Moreno RF, Glocker B et al (2016) Elasticfusion: real-time dense slam and light source estimation. Int J Robot Res 35(14):1697–1716
Sunderhauf N, Pham TT, Latif Y et al (2017) Meaningful maps with object-oriented semantic mapping. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5079–5085
Bowman SL, Atanasov N, Daniilidis K et al (2017) Probabilistic data association for semantic slam. In: IEEE international conference on robotics & automation (ICRA), pp 1722–1729
Huang G, Liu Z, Laurens VDM et al (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Silberman N, Hoiem D, Kohli P et al (2012) Indoor segmentation and support inference from RGBD images. In: European conference on computer vision (ECCV), pp 746–760
Vineet V, Miksik O, Lidegaard M et al (2015) Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: IEEE international conference on robotics & automation, pp 75–82
Salas-Moreno RF, Newcombe RA, Strasdat H et al (2013) Slam++: simultaneous localisation and mapping at the level of objects. In: Computer vision pattern recognition (CVPR), pp 1352–1359
Nakajima Y, Tateno K, Tombari F et al (2018) Fast and accurate semantic mapping through geometric-based incremental segmentation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 385–392
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: IEEE international conference on computer vision (ICCV), pp 1520–1528
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, pp 1495–1503
Schuler CJ, Hirsch M, Harmeling S et al (2016) Learning to deblur. IEEE Trans Pattern Anal Mach Intell 38(7):1439–1451
Shelhamer E, Long J,Darrell T (2017) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 472–483
Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Zhao H, Qi X, Shen X et al (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420
Badrinarayanan V, Kendall A, Segnet Cipolla R (2017) A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Kreo I, Krapac J, šegvić S (2019) Efficient ladder-style densenets for semantic segmentation of large images. arXiv preprint arXiv:1905.05661
Larsson G, Maire M, Shakhnarovich G. Fractalnet (2016) Ultra-deep neural networks without residuals. In: International conference on learning representations, pp 485–495
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Neural information processing systems (NIPS), pp 2377–2385
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Lee CY, Xie S (2015) Gallagher P et al. Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570
Zhang Z, Liang X, Dong X et al (2018) A sparse-view CT reconstruction method based on combination of densenet and deconvolution. IEEE Trans Med Imaging 37(6):1407–1417
Li T, Xu M, Yang R et al (2019) A densenet based approach for multi-frame in-loop filter in HEVC. In: Data compression conference, pp 270–279
Joaquim S, Matabosch C et al (2007) A review of recent range image registration methods with accuracy evaluation. Image Vis Comput 25(5):578–596
Article Google Scholar
Keller M, Lefloch D, Lambers M, Izadi S, Weyrich T. Kolb A (2013) Real-time 3D Reconstruction in Dynamic Scenes using Point-based Fusion. In: Proceedings of joint 3DIM/3DPVT conference (3DV), pp 1–8
Whelan T, Kaess M, Johannsson H, Fallon MF, Leonard JJ, McDonald JB (2015) Real-time large scale dense RGB-D SLAM with volumetric fusion. IJRR 34(4–5):598–626
Woodham RJ et al (1992) Photometric method for determining surface orientation from multiple images. Opt Eng 19(1):139–144
Google Scholar
Glocker J, Criminisi Shotton A, Izadi S (2015) Real-Time RGB-D camera relocalization via randomized ferns for Keyframe encoding. IEEE Trans Visual Comput Graphics 21(5):571–583
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on international conference on machine learning, pp 448–456
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: International conference on international conference on machine learning, pp 807–814
Everingham M, Winn J (2006) The Pascal visual object classes challenge 2007 (voc2007) development kit. Int J Comput Vis 111(1):98–136
Couprie C, Farabet C, Najman L et al (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572
Hariharan, B et al (2011) Semantic contours from inverse detectors. In: IEEE international conference on computer vision, ICCV, Barcelona, Spain, November 6–13, pp 991–998
Lin T-Y, Maire M, Belongie S, et al (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: CVPR, pp 6230–6239
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp 833–851
Zhang H, Dana KJ, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: CVPR, pp 7151–7160
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In CVPR, pp 1857–1866
Amiri Atashgah MA, Malaek SMB et al (2012) Prediction of aerial-image motion blurs due to the flying vehicle dynamics and camera characteristics in a virtual environment. J Aerosp Eng 227(7):1055–1067
Google Scholar
Amiri Atashgah MA, Gholampour P, Malaek SMB (2013) Integration of image de-blurring in an aerial Mono-SLAM. J Aerosp Eng 228(8):1348–1462
Google Scholar

Download references

Acknowledgements

This document is the results of the research projects funded by the National Natural Science Foundation of China (61873008) and Beijing Natural Science Foundation (4182008, 4192010).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Guoyu Zuo, Tao Zheng, Yuelei Liu, Zichen Xu, Daoxiong Gong & Jianjun Yu
Beijing Key Laboratory of Computing Intelligence and Intelligent Systems, Beijing, 100124, China
Guoyu Zuo, Tao Zheng, Yuelei Liu, Zichen Xu, Daoxiong Gong & Jianjun Yu

Authors

Guoyu Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yuelei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zichen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Daoxiong Gong
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoyu Zuo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zuo, G., Zheng, T., Liu, Y. et al. Fine semantic mapping based on dense segmentation network. Intel Serv Robotics 14, 47–60 (2021). https://doi.org/10.1007/s11370-020-00341-8

Download citation

Received: 05 December 2019
Accepted: 19 October 2020
Published: 16 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11370-020-00341-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fine semantic mapping based on dense segmentation network

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

CIM-WV: A 2D semantic segmentation dataset of rich window view contents in high-rise, high-density Hong Kong based on photorealistic city information models

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

CIM-WV: A 2D semantic segmentation dataset of rich window view contents in high-rise, high-density Hong Kong based on photorealistic city information models

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation