Skip to main content
Log in

Efficient Residual Neural Network for Semantic Segmentation

  • MATHEMATICAL THEORY OF IMAGES AND SIGNALS REPRESENTING, PROCESSING, ANALYSIS, RECOGNITION AND UNDERSTANDING
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

In this paper, we present an improved Efficient Neural Network (ENet) for semantic segmentation, and named the proposed network as Efficient Residual Neural Network (ERNet). The ERNet network contains two processing streams: one is pooling stream, which is used to obtain high-dimensional semantic information; the other is residual stream which is used to record low-dimensional boundary information. The ERNet has five stages, each stage contains several bottleneck modules. The output of each bottleneck in the ERNet network is fed into the residual stream. Starting from the second stage of ERNet, pooling stream and residual stream through concatenating are used as inputs for each down-sampling or up-sampling bottleneck. The identity mapping of residual stream shortens the distance between the near input and output terminals of each stage network in ERNet, alleviates the problem of vanishing gradient, strengthens the propagation of low-dimensional boundary features, and encourages feature reuse of low-dimensional boundary features. We tested ERNet on CamVid, Cityscape, and SUN RGB-D datasets. The segmentation speed of ERNet is close to that of ENet, but the segmentation accuracy is higher than that of ENet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Similar content being viewed by others

REFERENCES

  1. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39 (4), 640–651 (2014).

    Google Scholar 

  2. V. Badrinarayanan, A. Handa, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv (2015). arXiv:1511.00561

  3. M. Treml, J. Arjona-Medina, T. Unterthiner, R. Durgesh, F. Friedmann, P. Schuberth, A. Mayr, M. Heusel, M. Hofmarcher, M. Widrich, B. Nessler, and S. Hochreiter, “Speeding up semantic segmentation for autonomous driving,” in NIPS Workshop (2016).

  4. Liang Chieh Chen et al., “DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), 834–848 (2016).

    Article  Google Scholar 

  5. Golnaz Ghiasi and C. C. Fowlkes, “Laplacian pyramid reconstruction and refinement for semantic segmentation,” arXiv (2016). arXiv:1605.02264 [cs.CV]

  6. Liang Chieh Chen et al., “Attention to scale: Scale-aware semantic image segmentation,” arXiv (2015). arXiv:1511.03339 [cs.CV]

  7. B. Hariharan et al., “Hypercolumns for object segmentation and fine-grained localization,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).

  8. Fangting Xia et al., “Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net,” in ECCV 2016: Computer Vision–ECCV 2016 (2016), pp. 648–663.

    Google Scholar 

  9. T. Pohlen et al., “Full-resolution residual networks for semantic segmentation in street scenes,” arXiv (2016). arXiv:1611.08323 [cs.CV]

  10. V. Badrinarayanan, A. Handa, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling,” arXiv (2015). arXiv:1505.07293

  11. E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Efficient ConvNet for real-time semantic segmentation,” in 2017 IEEE Intelligent Vehicles Symposium (IV) (IEEE, 2017).

  12. A. Paszke et al., “ENet: A deep neural network architecture for real-time semantic segmentation,” arXiv (2016). arXiv:1606.02147 [cs.CV]

  13. G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-definition ground truth database,” Pattern Recognit. Lett. 30 (2), 88–97 (2009).

    Article  Google Scholar 

  14. M. Cordts et al., “The Cityscapes dataset for semantic urban scene understanding,” arXiv (2016). arXiv:1604.01685 [cs.CV]

  15. S. Song, S. P. Lichtenberg, and J. Xiao, “Sun RGB-D: A RGB-D scene understanding benchmark suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 567–576.

  16. J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Efficient object localization using convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 648–656.

  17. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on International Conference on Machine Learning (2015).

  18. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in ICCV’15: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (2015), pp. 1026–1034.

  19. Yu, Fisher and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv (2015). arXiv:1511.07122

  20. J. Jin, A. Dunder, and E. Culurciello, “Flattened convolutional neural networks for feedforward acceleration,” arXiv (2014). arXiv:1412.5474

Download references

Funding

This paper is supported by: Natural Science Foundation Project of science and Technology Department of Jilin Province under grant no. 20200201165JC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Li.

Ethics declarations

COMPLIANCE WITH ETHICAL STANDARDS

This article does not contain any studies involving animals performed by any of the authors.

This article does not contain any studies involving human participants performed by any of the authors.

CONFLICT OF INTERESTS

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Bin Li received his M.S. and PhD degrees from the School of Computer Science and Technology at Jilin University in China in 2011 and 2015, respectively. He is currently an Associate Professor with the School of Computer Science at Northeast Electric Power University. His research interests include image processing, computer vision, and pattern recognition.

Junyue Zang received her bachelor’s degree from the QINGDAO university of technology in 2017. She is currently a graduate student with the School of Computer Science at Northeast Electric Power University. Her research interests include computer vision, image processing, and deep learning.

Jie Cao received her Ph.D. degree from Computer Science and Technology, Jilin University, Changchun, China in 2017. Currently, she is an associate professor and also a master’s tutor in School of Computer Science, Northeast Electric Power University, Jilin. Her research interests include computer network, machine learning, and power grid stability and security.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, B., Zang, J. & Cao, J. Efficient Residual Neural Network for Semantic Segmentation. Pattern Recognit. Image Anal. 31, 212–220 (2021). https://doi.org/10.1134/S1054661821020103

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661821020103

Keywords:

Navigation