Abstract
In this paper, we present an improved Efficient Neural Network (ENet) for semantic segmentation, and named the proposed network as Efficient Residual Neural Network (ERNet). The ERNet network contains two processing streams: one is pooling stream, which is used to obtain high-dimensional semantic information; the other is residual stream which is used to record low-dimensional boundary information. The ERNet has five stages, each stage contains several bottleneck modules. The output of each bottleneck in the ERNet network is fed into the residual stream. Starting from the second stage of ERNet, pooling stream and residual stream through concatenating are used as inputs for each down-sampling or up-sampling bottleneck. The identity mapping of residual stream shortens the distance between the near input and output terminals of each stage network in ERNet, alleviates the problem of vanishing gradient, strengthens the propagation of low-dimensional boundary features, and encourages feature reuse of low-dimensional boundary features. We tested ERNet on CamVid, Cityscape, and SUN RGB-D datasets. The segmentation speed of ERNet is close to that of ENet, but the segmentation accuracy is higher than that of ENet.
Similar content being viewed by others
REFERENCES
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39 (4), 640–651 (2014).
V. Badrinarayanan, A. Handa, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv (2015). arXiv:1511.00561
M. Treml, J. Arjona-Medina, T. Unterthiner, R. Durgesh, F. Friedmann, P. Schuberth, A. Mayr, M. Heusel, M. Hofmarcher, M. Widrich, B. Nessler, and S. Hochreiter, “Speeding up semantic segmentation for autonomous driving,” in NIPS Workshop (2016).
Liang Chieh Chen et al., “DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), 834–848 (2016).
Golnaz Ghiasi and C. C. Fowlkes, “Laplacian pyramid reconstruction and refinement for semantic segmentation,” arXiv (2016). arXiv:1605.02264 [cs.CV]
Liang Chieh Chen et al., “Attention to scale: Scale-aware semantic image segmentation,” arXiv (2015). arXiv:1511.03339 [cs.CV]
B. Hariharan et al., “Hypercolumns for object segmentation and fine-grained localization,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
Fangting Xia et al., “Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net,” in ECCV 2016: Computer Vision–ECCV 2016 (2016), pp. 648–663.
T. Pohlen et al., “Full-resolution residual networks for semantic segmentation in street scenes,” arXiv (2016). arXiv:1611.08323 [cs.CV]
V. Badrinarayanan, A. Handa, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling,” arXiv (2015). arXiv:1505.07293
E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Efficient ConvNet for real-time semantic segmentation,” in 2017 IEEE Intelligent Vehicles Symposium (IV) (IEEE, 2017).
A. Paszke et al., “ENet: A deep neural network architecture for real-time semantic segmentation,” arXiv (2016). arXiv:1606.02147 [cs.CV]
G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-definition ground truth database,” Pattern Recognit. Lett. 30 (2), 88–97 (2009).
M. Cordts et al., “The Cityscapes dataset for semantic urban scene understanding,” arXiv (2016). arXiv:1604.01685 [cs.CV]
S. Song, S. P. Lichtenberg, and J. Xiao, “Sun RGB-D: A RGB-D scene understanding benchmark suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 567–576.
J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Efficient object localization using convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 648–656.
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on International Conference on Machine Learning (2015).
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in ICCV’15: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (2015), pp. 1026–1034.
Yu, Fisher and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv (2015). arXiv:1511.07122
J. Jin, A. Dunder, and E. Culurciello, “Flattened convolutional neural networks for feedforward acceleration,” arXiv (2014). arXiv:1412.5474
Funding
This paper is supported by: Natural Science Foundation Project of science and Technology Department of Jilin Province under grant no. 20200201165JC.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
COMPLIANCE WITH ETHICAL STANDARDS
This article does not contain any studies involving animals performed by any of the authors.
This article does not contain any studies involving human participants performed by any of the authors.
CONFLICT OF INTERESTS
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Bin Li received his M.S. and PhD degrees from the School of Computer Science and Technology at Jilin University in China in 2011 and 2015, respectively. He is currently an Associate Professor with the School of Computer Science at Northeast Electric Power University. His research interests include image processing, computer vision, and pattern recognition.
Junyue Zang received her bachelor’s degree from the QINGDAO university of technology in 2017. She is currently a graduate student with the School of Computer Science at Northeast Electric Power University. Her research interests include computer vision, image processing, and deep learning.
Jie Cao received her Ph.D. degree from Computer Science and Technology, Jilin University, Changchun, China in 2017. Currently, she is an associate professor and also a master’s tutor in School of Computer Science, Northeast Electric Power University, Jilin. Her research interests include computer network, machine learning, and power grid stability and security.
Rights and permissions
About this article
Cite this article
Li, B., Zang, J. & Cao, J. Efficient Residual Neural Network for Semantic Segmentation. Pattern Recognit. Image Anal. 31, 212–220 (2021). https://doi.org/10.1134/S1054661821020103
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661821020103