Skip to main content
Log in

Semantic Edge Detection with Diverse Deep Supervision

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Semantic edge detection (SED), which aims at jointly extracting edges as well as their category information, has far-reaching applications in domains such as semantic segmentation, object proposal generation, and object recognition. SED naturally requires achieving two distinct supervision targets: locating fine detailed edges and identifying high-level semantics. Our motivation comes from the hypothesis that such distinct targets prevent state-of-the-art SED methods from effectively using deep supervision to improve results. To this end, we propose a novel fully convolutional neural network using diverse deep supervision within a multi-task framework where bottom layers aim at generating category-agnostic edges, while top layers are responsible for the detection of category-aware semantic edges. To overcome the hypothesized supervision challenge, a novel information converter unit is introduced, whose effectiveness has been extensively evaluated on SBD and Cityscapes datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Acuna, D., Kar, A., & Fidler, S. (2019). Devil is in the edges: Learning semantic boundaries from noisy annotations. In IEEE conference on computer vision and pattern recognition (pp. 11075–11083).

  • Amer, M. R., Yousefi, S., Raich, R., & Todorovic, S. (2015). Monocular extraction of 2.1D sketch using constrained convex optimization. International Journal of Computer Vision, 112(1), 23–42.

    Article  Google Scholar 

  • Arbeláez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.

    Article  Google Scholar 

  • Bertasius, G., Shi, J., & Torresani, L. (2015a). DeepEdge: A multi-scale bifurcated deep network for top-down contour detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4380–4389).

  • Bertasius, G., Shi, J., & Torresani, L. (2015b). High-for-low and low-for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In Proceedings of the IEEE international conference on computer vision (pp. 504–512).

  • Bertasius, G., Shi, J., & Torresani, L. (2016). Semantic segmentation with boundary neural fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3602–3610).

  • Bian, J.-W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., et al. (2021). Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision, 129, 2548–2564.

    Article  Google Scholar 

  • Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 679–698.

    Article  Google Scholar 

  • Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., & Ma, Y. (2015). PCANet: A simple deep learning baseline for image classification? IEEE Transactions on Image Processing, 24(12), 5017–5032.

    Article  MathSciNet  Google Scholar 

  • Chen, L.-C., Barron, J. T., Papandreou, G., Murphy, K., & Yuille, A. L. (2016). Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4545–4554).

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).

  • Deng, R., Shen, C., Liu, S., Wang, H., & Liu, X. (2018). Learning to predict crisp boundaries. In European conference on computer vision (pp. 570–586).

  • Dollár, P., & Zitnick, C. L. (2015). Fast edge detection using structured forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1558–1570.

    Article  Google Scholar 

  • Ferrari, V., Fevrier, L., Jurie, F., & Schmid, C. (2008). Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 36–51.

    Article  Google Scholar 

  • Ferrari, V., Jurie, F., & Schmid, C. (2010). From images to shape models for object detection. International Journal of Computer Vision, 87(3), 284–303.

    Article  Google Scholar 

  • Ganin, Y., & Lempitsky, V. (2014). N\(^4\)-Fields: Neural network nearest neighbor fields for image transforms. In Asian conference on computer vision (pp. 536–551).

  • Hardie, R. C., & Boncelet, C. G. (1995). Gradient-based edge detection using nonlinear edge enhancing prefilters. IEEE Transactions on Image Processing, 4(11), 1572–1577.

    Article  Google Scholar 

  • Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In Proceedings of the IEEE international conference on computer vision (pp. 991–998).

  • Hayder, Z., He, X., & Salzmann, M. (2017). Boundary-aware instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5696–5704).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Henstock, P. V., & Chelberg, D. M. (1996). Automatic gradient threshold determination for edge detection. IEEE Transactions on Image Processing, 5(5), 784–787.

    Article  Google Scholar 

  • Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.

    Article  MathSciNet  Google Scholar 

  • Hou, Q., Cheng, M.-M., Hu, X., Borji, A., Tu, Z., & Torr, P. (2019). Deeply supervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4), 815–828.

    Article  Google Scholar 

  • Hou, Q., Liu, J., Cheng, M.-M., Borji, A., & Torr, P. H. (2018). Three birds one stone: A unified framework for salient object segmentation, edge detection and skeleton extraction. arXiv preprint arXiv:1803.09860.

  • Hu, X., Liu, Y., Wang, K., & Ren, B. (2018). Learning hybrid convolutional features for edge detection. Neurocomputing, 313(2018), 377–385.

    Article  Google Scholar 

  • Hu, Y., Chen, Y., Li, X., & Feng, J. (2019). Dynamic feature fusion for semantic edge detection. In International joint conferences on artificial intelligence (pp. 782–788).

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In The international conference on machine learning (pp. 448–456).

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM international conference on multimedia (pp. 675–678).

  • Khoreva, A., Benenson, R., Omran, M., Hein, M., & Schiele, B. (2016). Weakly supervised object boundaries. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 183–192).

  • Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., & Rother, C. (2017). Instancecut: From edges to instances with multicut. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5008–5017).

  • Kokkinos, I. (2016). Pushing the boundaries of boundary detection using deep learning. In The international conference on learning representations (pp. 1–12).

  • Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74.

    Article  Google Scholar 

  • Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., & Tu, Z. (2015). Deeply-supervised nets. In Artificial intelligence and statistics (pp. 562–570).

  • Lim, J. J., Zitnick, C. L., & Dollár, P. (2013). Sketch tokens: A learned mid-level representation for contour and object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3158–3165).

  • Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).

  • Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2020). Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 318–327.

    Article  Google Scholar 

  • Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (pp. 740–755).

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016). SSD: Single shot multibox detector. In European conference on computer vision (pp. 21–37).

  • Liu, Y., Cheng, M.-M., Hu, X., Bian, J.-W., Zhang, L., Bai, X., et al. (2019). Richer convolutional features for edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1939–1946.

    Article  Google Scholar 

  • Liu, Y., Cheng, M.-M., Hu, X., Wang, K., & Bai, X. (2017). Richer convolutional features for edge detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3000–3009).

  • Liu, Y., Jiang, P.-T., Petrosyan, V., Li, S.-J., Bian, J., Zhang, L., et al. (2018). DEL: Deep embedding learning for efficient image segmentation. In International joint conferences on artificial intelligence (pp. 864–870).

  • Mafi, M., Rajaei, H., Cabrerizo, M., & Adjouadi, M. (2018). A robust edge detection approach in the presence of high impulse noise intensity through switching adaptive median and fixed weighted mean filtering. IEEE Transactions on Image Processing, 27(11), 5475–5490.

    Article  MathSciNet  Google Scholar 

  • Maninis, K.-K., Pont-Tuset, J., Arbelaez, P., & Van Gool, L. (2017). Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 819–833.

    Article  Google Scholar 

  • Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5), 530–549.

    Article  Google Scholar 

  • Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In The international conference on machine learning (pp. 807–814).

  • Ramalingam, S., Bouaziz, S., Sturm, P., & Brand, M. (2010). Skyline2gps: Localization in urban canyons using omni-skylines. In The IEEE/RSJ international conference on intelligent robots and systems (pp. 3816–3823).

  • Shan, Q., Curless, B., Furukawa, Y., Hernandez, C., & Seitz, S. M. (2014). Occluding contours for multi-view stereo. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4002–4009).

  • Shen, W., Wang, X., Wang, Y., Bai, X., & Zhang, Z. (2015). DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3982–3991).

  • Shui, P.-L., & Wang, F.-P. (2017). Anti-impulse-noise edge detection via anisotropic morphological directional derivatives. IEEE Transactions on Image Processing, 26(10), 4962–4977.

    Article  MathSciNet  Google Scholar 

  • Sobel, I. (1970). Camera models and machine perception. Technical report, Stanford Univercity California, Department of Computer Science.

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.

    MathSciNet  MATH  Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).

  • Takikawa, T., Acuna, D., Jampani, V., & Fidler, S. (2019). Gated-SCNN: Gated shape CNNs for semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 5229–5238).

  • Tang, P., Wang, X., Feng, B., & Liu, W. (2017). Learning multi-instance deep discriminative patterns for image classification. IEEE Transactions on Image Processing, 26(7), 3385–3396.

    Article  MathSciNet  Google Scholar 

  • Trahanias, P. E., & Venetsanopoulos, A. N. (1993). Color edge detection using vector order statistics. IEEE Transactions on Image Processing, 2(2), 259–264.

    Article  Google Scholar 

  • Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 3119–3127).

  • Wang, Y., Zhao, X., Li, Y., & Huang, K. (2019). Deep crisp boundaries: From boundaries to higher-level tasks. IEEE Transactions on Image Processing, 28(3), 1285–1298.

    Article  MathSciNet  Google Scholar 

  • Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision (pp. 1395–1403).

  • Xie, S., & Tu, Z. (2017). Holistically-nested edge detection. International Journal of Computer Vision, 125(1–3), 3–18.

    Article  MathSciNet  Google Scholar 

  • Yang, J., Price, B., Cohen, S., Lee, H., & Yang, M.-H. (2016). Object contour detection with a fully convolutional encoder–decoder network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 193–202).

  • Yang, W., Feng, J., Yang, J., Zhao, F., Liu, J., Guo, Z., et al. (2017). Deep edge guided recurrent residual learning for image super-resolution. IEEE Transactions on Image Processing, 26(12), 5895–5907.

    Article  MathSciNet  Google Scholar 

  • Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In International conference on learning representations (pp. 1–13).

  • Yu, Z., Feng, C., Liu, M.-Y., & Ramalingam, S. (2017). CASENet: Deep category-aware semantic edge detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5964–5973).

  • Yu, Z., Liu, W., Zou, Y., Feng, C., Ramalingam, S., Kumar, B., et al. (2018). Simultaneous edge alignment and learning. In European conference on computer vision (pp. 400–417).

  • Zamir, A. R., Sax, A., Shen, W., Guibas, L., Malik, J., & Savarese, S. (2018). Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3712–3722).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming-Ming Cheng.

Additional information

Communicated by Yuri Boykov.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by the National Key Research and Development Program of China Grant No. 2018AAA0100400 and NSFC (No. 61922046)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Cheng, MM., Fan, DP. et al. Semantic Edge Detection with Diverse Deep Supervision. Int J Comput Vis 130, 179–198 (2022). https://doi.org/10.1007/s11263-021-01539-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01539-8

Keywords

Navigation