Two-branch encoding and iterative attention decoding network for semantic segmentation

Zhu, Hegui; Zhang, Min; Zhang, Xiangde; Zhang, Libo

doi:10.1007/s00521-020-05312-9

Two-branch encoding and iterative attention decoding network for semantic segmentation

Original Article
Published: 01 September 2020

Volume 33, pages 5151–5166, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Hegui Zhu¹,
Min Zhang¹,
Xiangde Zhang ORCID: orcid.org/0000-0003-4378-5381¹ &
…
Libo Zhang²

620 Accesses
5 Citations
Explore all metrics

Abstract

Deep convolutional neural networks(DCNNs) have shown outstanding performance in semantic image segmentation. In this paper, we propose a two-branch encoding and iterative attention decoding semantic segmentation model. In encoding stage, an improved PeleeNet is used as the backbone branch to extract dense image features, and the spatial branch is used to preserve fine-grained information. In decoding stage, the iterative attention decoding is employed to optimize the segmentation results with multi-scale features. Furthermore, we propose a channel position attention module and a boundary residual attention module to learn different position and boundary features, which can enrich the target boundary position information. Finally, we use SegNet as the basic network and conduct some experiments to evaluate the effect of each component in the proposed model with accuracy and mIOU on CamVid dataset. Furthermore, we verify the segmentation performance of the proposed model with comparable experiments on CamVid, Cityscapes and PASCAL VOC 2012 dataset. In particular, the model has achieved 91.7% segmentation accuracy and 67.1% mIOU on the CamVid dataset respectively, which verify the effectiveness of our proposed model. In the future, we can combine target detection with semantic segmentation to further improve the semantic segmentation effect of small objects. We also hope to further optimize the model structure and reduce its time complexities and parameters under the guarantee of effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Article 27 November 2023

A lightweight network with attention decoder for real-time semantic segmentation

Article 07 May 2021

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

Article 10 April 2024

References

Karpathy A, Li FF (2015) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676
Article Google Scholar
Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the advances in international conference on machine learning, pp 2048–2057
Aneja J, Deshpande A, Schwing AG (2018) Convolutional image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5561–5570
Papandreou G, Kokkinos I, Savalle PA (2015) Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 390–399
Dai J, Li Y, He K et al (2016) R–FCN: object detection via region-based fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 379–387
Wang C, Bai X, Wang S et al (2019) Multiscale visual attention networks for object detection in VHR remote sensing images. IEEE Geosci Remote Sens Lett 16(2):310–314
Article MathSciNet Google Scholar
Kaneko AM, Yamamoto K (2016) Landmark recognition based on image characterization by segmentation points for autonomous driving. In: IEEE sice international symposium on control systems (ISCS), pp 1–8
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Li R, Liu W, Yang L et al (2018) DeepUNet: a deep fully convolutional network for pixel-level sea-land segmentation. IEEE J Sel Top Appl Earth Observ Remote Sens 11(11):3954–3962
Article Google Scholar
Chen LC, Papandreou G, Kokkinos I et al (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:14127062
Lin G, Milan A, Shen C et al (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Yu C, Wang J, Peng C et al (2018) Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1857–1866
Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Chen LC, Papandreou G, Kokkinos I et al (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Chen LC, Zhu Y, Papandreou G et al (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Wu H, Zhang J, Huang K et al (2019) FastFCN: rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:190311816
Yu C, Wang J, Peng C et al (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341
Zheng S, Jayasumana S, Romeraparedes B et al (2015) Conditional random feilds as recurrent nerual networks. In: International conference on computer vision, pp 1529–1537
Peng C, Zhang X, Yu G et al (2017) Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361
Chen LC, Yang Y, Wang J et al (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649
Chen L, Zhang H, Xiao J et al (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Zhu H, Miao Y, Zhang X (2020) Semantic image segmentation with improved position attention and feature fusion. Neural Proces Lett. https://doi.org/10.1007/s11063-020-10240-9
Article Google Scholar
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861
Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Li H, Xiong P, Fan H et al (2019) DFAnet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9522–9531
Zhu HG, Wang BY, Zhang XD et al (2020) Semantic image segmentation with shared decomposition convolution and boundary reinforcement structure. Appl Intell. https://doi.org/10.1007/s10489-020-01671-x
Article Google Scholar
Wang RJ, Li X, Ling CX (2018) Pelee: a real-time object detection system on mobile devices. In: Advances in neural information processing systems, pp 1963–1972
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528
Visin F, Ciccone M, Romero A et al (2016) Reseg: a recurrent neural network-based model for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 41–48
Jégou S, Drozdzal M, Vazquez D et al (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Kundu A, Vineet V, Koltun V (2016) Feature space optimization for semantic video segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Liu J, Wang Y et al (2017) Stacked deconvolutional network for semantic segmentation. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2019.2895460
Article Google Scholar
Molchanov P, Tyree S, Karras T et al (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
Zhao H, Qi X, Shen X et al (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420
Ghiasi G, Fowlkes C (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision, pp 519–534
Zhang T, Lin G, Cai J et al (2019) Decoupled spatial neural attention for weakly supervised semantic segmentation. IEEE Trans Multimed 21(11):1–11
Article Google Scholar
Ren S, He K, Girshick R et al (2016) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481
Article Google Scholar
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3376–3385
Liu Y, Yu J, Han Y (2018) Understanding the effective receptive field in semantic image segmentation. Multimed Tools Appl 77(17):22159–22171
Article Google Scholar
Vemulapalli R, Tuzel O, Liu MY et al (2016) Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3224–3233
Liu Z, Li X, Luo P et al (2015) Semantic image segmentation via deep parsing network. In: International conference on computer vision, pp 1377–1385

Download references

Acknowledgements

This study was funded by the National Key R&D Program of China (No. 2017YFF0108800), the Natural Science Foundation of Liaoning Province (No. 2020-MS-080), the Fundamental Research Funds for the Central Universities (No. N2005032), Special Foundation of military logistics science and technology of China (CLB8C050), Key projects of Natural Science Foundation of Liaoning Province(No. 2017012074-301)

Author information

Authors and Affiliations

College of Sciences, Northeastern University, Shenyang, 110819, China
Hegui Zhu, Min Zhang & Xiangde Zhang
Department of radiology, The General Hospital of Northern Theater Command PLA, Shenyang, 110016, China
Libo Zhang

Authors

Hegui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangde Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Libo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangde Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, H., Zhang, M., Zhang, X. et al. Two-branch encoding and iterative attention decoding network for semantic segmentation. Neural Comput & Applic 33, 5151–5166 (2021). https://doi.org/10.1007/s00521-020-05312-9

Download citation

Received: 09 March 2020
Accepted: 19 August 2020
Published: 01 September 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s00521-020-05312-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-branch encoding and iterative attention decoding network for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

A lightweight network with attention decoder for real-time semantic segmentation

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-branch encoding and iterative attention decoding network for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

A lightweight network with attention decoder for real-time semantic segmentation

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation