Skip to main content
Log in

Efficient use of recent progresses for Real-time Semantic segmentation

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Different approaches were proposed to design deep CNNs for semantic segmentation. Usually, they are built upon an encoder–decoder architecture and require computationally expensive operations on high-resolution activation maps. Since for real-time segmentation the costs are critical, efficient approaches compromise spatial information to achieve real-time segmentation but with a considerable drop in accuracy. We introduce a new module based on depthwise separable, shuffled and grouped convolutions that optimize up-sampling operations by using a sizeable receptive field and preserving spatial information. Then, we designed an efficient network based on dense connectivity to achieve a remarkable trade-off accuracy and speed. We show through set of experiments that even by up-sampling with a lightweight decoder, our applied architecture scores on Cityscape 69.5% Mean IoU with \(1024\times 512\) inputs and 95.2 FPS on the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Wu, Z., Shen, C., Van Den Hengel, A.: “Real-time Semantic Image Segmentation via Spatial Sparsity,” arXiv (2017)

  2. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images (2017)

  3. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2016)

    Article  Google Scholar 

  4. Chollet, F.: “Xception: deep learning with depthwise separable convolutions (2017)

  5. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: “ENet: a deep neural network architecture for real-time semantic segmentation, pp. 1–10 (2016)

  6. Szegedy et al., C.: “Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07–12–June, no. 3, pp. 1–9 (2015)

  7. Jin, J., Dundar, A., Culurciello, E.: “Flattened Convolutional Neural Networks for Feedforward Acceleration (2014)

  8. Han, S., Mao, H., Dally, W. J.: “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (2015)

  9. Chen, W., Wilson, J. T., Tyree, S., Weinberger, K. Q., Chen, Y.: “Compressing Neural Networks with the Hashing Trick (2015)

  10. Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: “Quantized Convolutional Neural Networks for Mobile Devices,” (2015)

  11. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, pp. 2261–2269 (2017)

  12. Cordts et al., M.: The Cityscapes Dataset for Semantic Urban Scene Understanding (2016)

  13. Shelhamer, E., Long, J., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation (2017)

  14. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition (2015)

  15. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation (2015)

  16. Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, pp. 5168–5177 (2017)

  17. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid Scene Parsing Network (2016)

  18. Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation, Lecture Notes of Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11211 LNCS, pp. 833–851 (2018)

  19. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks, Lecture Notes of Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9908 LNCS, pp. 630–645 (2016)

  20. Liu, Z., Li, X., Luo, P., Loy, C. C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of IEEE International Conference Computer Vision, vol. 2015 Inter, pp. 1377–1385 (2015)

  21. Zheng et al., S.: Conditional random fields as recurrent neural networks. arXiv:1502.03240\([cs]\) (2015)

  22. Teichmann, M. T. T., Cipolla, R.: “Convolutional CRFs for Semantic Segmentation,” (2018)

  23. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11207 LNCS, pp. 418–434 (2018)

  24. Romera, E., Alvarez, J. M., Bergasa, L. M., Arroyo, R.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. In: Tits, pp. 1–10 (2018)

  25. Shelhamer, E., Rakelly, K., Hoffman, J., Darrell, T.: Clockwork convnets for video semantic segmentation, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9915 LNCS, pp. 852–868 (2016)

  26. Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11214 LNCS, pp. 561–580 (2018)

  27. Li, X., Liu, Z., Luo, P., Loy, C. C., Tang, X.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, no. Mc, pp. 6459–6468 (2017)

  28. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  29. Wang, X., Yu, F., Dou, Z. Y., Darrell, T., Gonzalez, J. E.: SkipNet: learning dynamic routing in convolutional networks, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11217 LNCS, pp. 420–436 (2018)

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Computer Society Conference Computer Vision Pattern Recognition, vol. 2016, pp. 770–778 (2016)

  31. Howard et al., A.G.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017)

  32. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks (2012)

  33. Zhang, X., Zhou, X., Lin, M.: ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (2017)

  34. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception Architecture for Computer Vision (2015)

  35. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, pp. 636–644 (2017)

  36. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks (2018)

  37. Vallurupalli, N., Annamaneni, S., Varma, G., Jawahar, C. V., Mathew, M., Nagori, S.: Efficient Semantic Segmentation using Gradual Grouping, pp. 711–719 (2018)

  38. Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions (2015)

  39. Poudel, R. P. K., Bonde, U., Liwicki, S., Zach, C.: ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time, pp. 1–11 (2018)

  40. Lo, S.-Y., Hang, H.-M., Chan, S.-W., Lin, J.-J.: Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Safae El Houfi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

El Houfi, S., Majda, A. Efficient use of recent progresses for Real-time Semantic segmentation. Machine Vision and Applications 31, 45 (2020). https://doi.org/10.1007/s00138-020-01095-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-020-01095-0

Keywords

Navigation