Skip to main content
Log in

Two-branch encoding and iterative attention decoding network for semantic segmentation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Deep convolutional neural networks(DCNNs) have shown outstanding performance in semantic image segmentation. In this paper, we propose a two-branch encoding and iterative attention decoding semantic segmentation model. In encoding stage, an improved PeleeNet is used as the backbone branch to extract dense image features, and the spatial branch is used to preserve fine-grained information. In decoding stage, the iterative attention decoding is employed to optimize the segmentation results with multi-scale features. Furthermore, we propose a channel position attention module and a boundary residual attention module to learn different position and boundary features, which can enrich the target boundary position information. Finally, we use SegNet as the basic network and conduct some experiments to evaluate the effect of each component in the proposed model with accuracy and mIOU on CamVid dataset. Furthermore, we verify the segmentation performance of the proposed model with comparable experiments on CamVid, Cityscapes and PASCAL VOC 2012 dataset. In particular, the model has achieved 91.7% segmentation accuracy and 67.1% mIOU on the CamVid dataset respectively, which verify the effectiveness of our proposed model. In the future, we can combine target detection with semantic segmentation to further improve the semantic segmentation effect of small objects. We also hope to further optimize the model structure and reduce its time complexities and parameters under the guarantee of effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Karpathy A, Li FF (2015) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676

    Article  Google Scholar 

  2. Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the advances in international conference on machine learning, pp 2048–2057

  3. Aneja J, Deshpande A, Schwing AG (2018) Convolutional image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5561–5570

  4. Papandreou G, Kokkinos I, Savalle PA (2015) Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 390–399

  5. Dai J, Li Y, He K et al (2016) R–FCN: object detection via region-based fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 379–387

  6. Wang C, Bai X, Wang S et al (2019) Multiscale visual attention networks for object detection in VHR remote sensing images. IEEE Geosci Remote Sens Lett 16(2):310–314

    Article  MathSciNet  Google Scholar 

  7. Kaneko AM, Yamamoto K (2016) Landmark recognition based on image characterization by segmentation points for autonomous driving. In: IEEE sice international symposium on control systems (ISCS), pp 1–8

  8. Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  9. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  10. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  11. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  12. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

  13. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  14. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241

  15. Li R, Liu W, Yang L et al (2018) DeepUNet: a deep fully convolutional network for pixel-level sea-land segmentation. IEEE J Sel Top Appl Earth Observ Remote Sens 11(11):3954–3962

    Article  Google Scholar 

  16. Chen LC, Papandreou G, Kokkinos I et al (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:14127062

  17. Lin G, Milan A, Shen C et al (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934

  18. Yu C, Wang J, Peng C et al (2018) Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1857–1866

  19. Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587

  20. Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  21. Chen LC, Papandreou G, Kokkinos I et al (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  22. Chen LC, Zhu Y, Papandreou G et al (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  23. Wu H, Zhang J, Huang K et al (2019) FastFCN: rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:190311816

  24. Yu C, Wang J, Peng C et al (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341

  25. Zheng S, Jayasumana S, Romeraparedes B et al (2015) Conditional random feilds as recurrent nerual networks. In: International conference on computer vision, pp 1529–1537

  26. Peng C, Zhang X, Yu G et al (2017) Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361

  27. Chen LC, Yang Y, Wang J et al (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649

  28. Chen L, Zhang H, Xiao J et al (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667

  29. Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154

  30. Zhu H, Miao Y, Zhang X (2020) Semantic image segmentation with improved position attention and feature fusion. Neural Proces Lett. https://doi.org/10.1007/s11063-020-10240-9

    Article  Google Scholar 

  31. Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861

  32. Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  33. Li H, Xiong P, Fan H et al (2019) DFAnet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9522–9531

  34. Zhu HG, Wang BY, Zhang XD et al (2020) Semantic image segmentation with shared decomposition convolution and boundary reinforcement structure. Appl Intell. https://doi.org/10.1007/s10489-020-01671-x

    Article  Google Scholar 

  35. Wang RJ, Li X, Ling CX (2018) Pelee: a real-time object detection system on mobile devices. In: Advances in neural information processing systems, pp 1963–1972

  36. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528

  37. Visin F, Ciccone M, Romero A et al (2016) Reseg: a recurrent neural network-based model for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 41–48

  38. Jégou S, Drozdzal M, Vazquez D et al (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19

  39. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122

  40. Kundu A, Vineet V, Koltun V (2016) Feature space optimization for semantic video segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934

  41. Liu J, Wang Y et al (2017) Stacked deconvolutional network for semantic segmentation. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2019.2895460

    Article  Google Scholar 

  42. Molchanov P, Tyree S, Karras T et al (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440

  43. Zhao H, Qi X, Shen X et al (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420

  44. Ghiasi G, Fowlkes C (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision, pp 519–534

  45. Zhang T, Lin G, Cai J et al (2019) Decoupled spatial neural attention for weakly supervised semantic segmentation. IEEE Trans Multimed 21(11):1–11

    Article  Google Scholar 

  46. Ren S, He K, Girshick R et al (2016) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481

    Article  Google Scholar 

  47. Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3376–3385

  48. Liu Y, Yu J, Han Y (2018) Understanding the effective receptive field in semantic image segmentation. Multimed Tools Appl 77(17):22159–22171

    Article  Google Scholar 

  49. Vemulapalli R, Tuzel O, Liu MY et al (2016) Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3224–3233

  50. Liu Z, Li X, Luo P et al (2015) Semantic image segmentation via deep parsing network. In: International conference on computer vision, pp 1377–1385

Download references

Acknowledgements

This study was funded by the National Key R&D Program of China (No. 2017YFF0108800), the Natural Science Foundation of Liaoning Province (No. 2020-MS-080), the Fundamental Research Funds for the Central Universities (No. N2005032), Special Foundation of military logistics science and technology of China (CLB8C050), Key projects of Natural Science Foundation of Liaoning Province(No. 2017012074-301)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangde Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, H., Zhang, M., Zhang, X. et al. Two-branch encoding and iterative attention decoding network for semantic segmentation. Neural Comput & Applic 33, 5151–5166 (2021). https://doi.org/10.1007/s00521-020-05312-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05312-9

Keywords

Navigation