Integration of top-down and bottom-up visual processing using a recurrent convolutional–deconvolutional neural network for semantic segmentation

Kim, Byung Wan; Park, Youngbin; Suh, Il Hong

doi:10.1007/s11370-019-00296-5

Integration of top-down and bottom-up visual processing using a recurrent convolutional–deconvolutional neural network for semantic segmentation

Original Research Paper
Published: 09 October 2019

Volume 13, pages 87–97, (2020)
Cite this article

Intelligent Service Robotics Aims and scope Submit manuscript

Byung Wan Kim¹,
Youngbin Park¹ &
Il Hong Suh²

623 Accesses
2 Citations
Explore all metrics

Abstract

Semantic segmentation has a wide array of applications such as scene understanding, autonomous driving, and robot manipulation tasks. While existing segmentation models have achieved good performance using bottom-up deep neural processing, this paper describes a novel deep learning architecture that integrates top-down and bottom-up processing. The resulting model achieves higher accuracy at a relatively low computational cost. In the proposed model, higher-level top-down information is transmitted to the lower layers through recurrent connections in an encoder and a decoder, and the recurrent connection weights are trained using backpropagation. Experiments on several benchmark datasets demonstrate that this use of top-down information improves the mean intersection over union by more than 3% compared with a state-of-the-art bottom-up only network using the CamVid, SUN-RGBD and PASCAL VOC 2012 benchmark datasets. Additionally, the proposed model is successfully applied to a dataset designed for robotic grasping tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Semantic Segmentation and Object Grasping Strategy Generation Based on Deeplab Algorithm

Semantic Segmentation with Peripheral Vision

Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet

References

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
He K et al (2017) Mask r-cnn. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 2980–2988
Hariharan B et al (2015) Hypercolumns for object segmentation and finegrained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv preprint arXiv:1409.1556
Chen L-C et al (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. In: arXiv preprint arXiv:1606.00915
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in neural information processing systems, pp 109–117
Russell C, Kohli P, Torr PHS et al (2009) Associative hierarchical CRFs for object class image segmentation. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 739–746
Uijlings JRR et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Zitnick CL, Doll’ar P (2014) Edge boxes: locating object proposals from edges. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) European conference on computer vision. Springer, Berlin, pp 391–405
Google Scholar
Girshick R et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: arXiv preprint arXiv:1504.08083
Goldstein EB (2014) Cognitive psychology: connecting mind, research and everyday experience. Nelson Education, Scarborough
Google Scholar
Cichy RM, Pantazis D, Oliva A (2014) Resolving human object recognition in space and time. Nat Neurosci 17(3):455
Article Google Scholar
Lee TS, Mumford D (2003) Hierarchical Bayesian inference in the visual cortex. JOSA A 20(7):1434–1448
Article Google Scholar
Carreira J et al (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742
Li K, Hariharan B, Malik J (2016) Iterative instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3659–3667
Zamir AR et al (2017) Feedback networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1808–1817
Cao C et al (2015) Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2956–2964
Stollenga MF, Masci J, Gomez F, Schmidhuber J (2014) Deep networks with internal selective attention through feedback connections. In: Advances in neural information processing systems, pp 3545–3553
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) European conference on computer vision. Springer, Berlin, pp 818–833
Google Scholar
Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2018–2025
Wang Q, Zhang J, Song S, Zhang Z (2014) Attentional neural network: feature selection using cognitive feedback. In: Advances in neural information processing systems, pp 2033–2041
Salakhutdinov R, Larochelle H (2010) Efficient learning of deep Boltzmann machines. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 693–700
Sohn K et al (2013) Learning and selecting features jointly with point-wise gated Boltzmann machines. In: International conference on machine learning, pp 217–225
Li Y et al (2017) Fully convolutional instance-aware semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2359–2367
Dai J et al (2016) Instance-sensitive fully convolutional networks. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Berlin, pp 534–549
Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: NIPS
Detry R, Papon J, Matthies L (2017) Taskoriented grasping with semantic and geometric scene understanding. In: IEEE/RSJ international conference on intelligent robots and systems
Treml M et al (2016) Speeding up semantic segmentation for autonomous driving. In: MLITS, NIPS workshop
Siam M et al (2017) Deep semantic segmentation for automated driving: taxonomy, roadmap and challenges. In: arXiv preprint arXiv:1707.02432
Semwal VB, Raj M, Nandi GC (2015) Biometric gait identification based on a multilayer perceptron. Robot Auton Syst 65:65–75
Article Google Scholar
Semwal VB, Mondal K, Nandi GC (2017) Robust and accurate feature selection for humanoid push recovery and classification: deep learning approach. Neural Comput Appl 28(3):565–574
Article Google Scholar
Spratling MW, Johnson MH (2004) A feedback model of visual attention. J Cogn Neurosci 16(2):219–237
Article Google Scholar
Deco G, Zihl J (2001) A neurodynamical model of visual attention: feedback enhancement of spatial resolution in a hierarchical system. J Comput Neurosci 10(3):231–253
Article Google Scholar
Xingjian SHI, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Deng J et al (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533
Article Google Scholar
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3367–3375
Wyatte D, Jilk DJ, O’Reilly RC (2014) Early recurrent feedback facilitates visual object recognition under challenging conditions. Front Psychol 5:674
Article Google Scholar
Gregor K et al (2015) DRAW: a recurrent neural network for image generation. In: arXiv preprint arXiv:1502.04623
Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97
Article Google Scholar
Song S, Lichtenberg SP, Xiao J (2015) SUN RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR, vol 5, p 6
Silberman N et al (2012) Indoor segmentation and support inference from RGBD images. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) European conference on computer vision. Springer, Berlin, pp 746–760
Google Scholar
Everingham M et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. In: arXiv preprint arXiv:1412.6980
Csurka G et al (2013) What is a good evaluation measure for semantic segmentation? In: BMVC, vol 27. Citeseer, p 2013
Qi CR et al (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the computer vision and pattern recognition (CVPR), vol 1, no 2. IEEE, p 4
Richtsfeld M, Vincze M (2011) Robotic grasping of unknown objects. In: Goto S (ed) Robot arms. InTech, London
Google Scholar
Pas AT, Platt R (2015) Using geometry to detect grasps in 3d point clouds. In: arXiv preprint arXiv:1501.03100

Download references

Acknowledgements

This work was supported by the Technology Innovation Industrial Program funded by the Ministry of Trade (MI, South Korea) [10073161, Technology Innovation Program], as well as by Institute for Information and Communications Technology Promotion (IITP) Grant funded by MSIT (No. 2018-0-00622).

Author information

Authors and Affiliations

Intelligence and Control for Robots Laboratory, Hanyang University, Seoul, Korea
Byung Wan Kim & Youngbin Park
Department of Electronic and Computer Engineering, Hanyang University, Seoul, Korea
Il Hong Suh

Authors

Byung Wan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Youngbin Park
View author publications
You can also search for this author in PubMed Google Scholar
Il Hong Suh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Il Hong Suh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, B.W., Park, Y. & Suh, I.H. Integration of top-down and bottom-up visual processing using a recurrent convolutional–deconvolutional neural network for semantic segmentation. Intel Serv Robotics 13, 87–97 (2020). https://doi.org/10.1007/s11370-019-00296-5

Download citation

Received: 07 September 2018
Accepted: 23 September 2019
Published: 09 October 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11370-019-00296-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integration of top-down and bottom-up visual processing using a recurrent convolutional–deconvolutional neural network for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

Research on Semantic Segmentation and Object Grasping Strategy Generation Based on Deeplab Algorithm

Semantic Segmentation with Peripheral Vision

Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integration of top-down and bottom-up visual processing using a recurrent convolutional–deconvolutional neural network for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

Research on Semantic Segmentation and Object Grasping Strategy Generation Based on Deeplab Algorithm

Semantic Segmentation with Peripheral Vision

Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation