Abstract
Semantic edge detection (SED), which aims at jointly extracting edges as well as their category information, has far-reaching applications in domains such as semantic segmentation, object proposal generation, and object recognition. SED naturally requires achieving two distinct supervision targets: locating fine detailed edges and identifying high-level semantics. Our motivation comes from the hypothesis that such distinct targets prevent state-of-the-art SED methods from effectively using deep supervision to improve results. To this end, we propose a novel fully convolutional neural network using diverse deep supervision within a multi-task framework where bottom layers aim at generating category-agnostic edges, while top layers are responsible for the detection of category-aware semantic edges. To overcome the hypothesized supervision challenge, a novel information converter unit is introduced, whose effectiveness has been extensively evaluated on SBD and Cityscapes datasets.
Similar content being viewed by others
References
Acuna, D., Kar, A., & Fidler, S. (2019). Devil is in the edges: Learning semantic boundaries from noisy annotations. In IEEE conference on computer vision and pattern recognition (pp. 11075–11083).
Amer, M. R., Yousefi, S., Raich, R., & Todorovic, S. (2015). Monocular extraction of 2.1D sketch using constrained convex optimization. International Journal of Computer Vision, 112(1), 23–42.
Arbeláez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.
Bertasius, G., Shi, J., & Torresani, L. (2015a). DeepEdge: A multi-scale bifurcated deep network for top-down contour detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4380–4389).
Bertasius, G., Shi, J., & Torresani, L. (2015b). High-for-low and low-for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In Proceedings of the IEEE international conference on computer vision (pp. 504–512).
Bertasius, G., Shi, J., & Torresani, L. (2016). Semantic segmentation with boundary neural fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3602–3610).
Bian, J.-W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., et al. (2021). Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision, 129, 2548–2564.
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 679–698.
Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., & Ma, Y. (2015). PCANet: A simple deep learning baseline for image classification? IEEE Transactions on Image Processing, 24(12), 5017–5032.
Chen, L.-C., Barron, J. T., Papandreou, G., Murphy, K., & Yuille, A. L. (2016). Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4545–4554).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).
Deng, R., Shen, C., Liu, S., Wang, H., & Liu, X. (2018). Learning to predict crisp boundaries. In European conference on computer vision (pp. 570–586).
Dollár, P., & Zitnick, C. L. (2015). Fast edge detection using structured forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1558–1570.
Ferrari, V., Fevrier, L., Jurie, F., & Schmid, C. (2008). Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 36–51.
Ferrari, V., Jurie, F., & Schmid, C. (2010). From images to shape models for object detection. International Journal of Computer Vision, 87(3), 284–303.
Ganin, Y., & Lempitsky, V. (2014). N\(^4\)-Fields: Neural network nearest neighbor fields for image transforms. In Asian conference on computer vision (pp. 536–551).
Hardie, R. C., & Boncelet, C. G. (1995). Gradient-based edge detection using nonlinear edge enhancing prefilters. IEEE Transactions on Image Processing, 4(11), 1572–1577.
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In Proceedings of the IEEE international conference on computer vision (pp. 991–998).
Hayder, Z., He, X., & Salzmann, M. (2017). Boundary-aware instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5696–5704).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Henstock, P. V., & Chelberg, D. M. (1996). Automatic gradient threshold determination for edge detection. IEEE Transactions on Image Processing, 5(5), 784–787.
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Hou, Q., Cheng, M.-M., Hu, X., Borji, A., Tu, Z., & Torr, P. (2019). Deeply supervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4), 815–828.
Hou, Q., Liu, J., Cheng, M.-M., Borji, A., & Torr, P. H. (2018). Three birds one stone: A unified framework for salient object segmentation, edge detection and skeleton extraction. arXiv preprint arXiv:1803.09860.
Hu, X., Liu, Y., Wang, K., & Ren, B. (2018). Learning hybrid convolutional features for edge detection. Neurocomputing, 313(2018), 377–385.
Hu, Y., Chen, Y., Li, X., & Feng, J. (2019). Dynamic feature fusion for semantic edge detection. In International joint conferences on artificial intelligence (pp. 782–788).
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In The international conference on machine learning (pp. 448–456).
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM international conference on multimedia (pp. 675–678).
Khoreva, A., Benenson, R., Omran, M., Hein, M., & Schiele, B. (2016). Weakly supervised object boundaries. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 183–192).
Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., & Rother, C. (2017). Instancecut: From edges to instances with multicut. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5008–5017).
Kokkinos, I. (2016). Pushing the boundaries of boundary detection using deep learning. In The international conference on learning representations (pp. 1–12).
Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74.
Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., & Tu, Z. (2015). Deeply-supervised nets. In Artificial intelligence and statistics (pp. 562–570).
Lim, J. J., Zitnick, C. L., & Dollár, P. (2013). Sketch tokens: A learned mid-level representation for contour and object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3158–3165).
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2020). Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 318–327.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (pp. 740–755).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016). SSD: Single shot multibox detector. In European conference on computer vision (pp. 21–37).
Liu, Y., Cheng, M.-M., Hu, X., Bian, J.-W., Zhang, L., Bai, X., et al. (2019). Richer convolutional features for edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1939–1946.
Liu, Y., Cheng, M.-M., Hu, X., Wang, K., & Bai, X. (2017). Richer convolutional features for edge detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3000–3009).
Liu, Y., Jiang, P.-T., Petrosyan, V., Li, S.-J., Bian, J., Zhang, L., et al. (2018). DEL: Deep embedding learning for efficient image segmentation. In International joint conferences on artificial intelligence (pp. 864–870).
Mafi, M., Rajaei, H., Cabrerizo, M., & Adjouadi, M. (2018). A robust edge detection approach in the presence of high impulse noise intensity through switching adaptive median and fixed weighted mean filtering. IEEE Transactions on Image Processing, 27(11), 5475–5490.
Maninis, K.-K., Pont-Tuset, J., Arbelaez, P., & Van Gool, L. (2017). Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 819–833.
Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5), 530–549.
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In The international conference on machine learning (pp. 807–814).
Ramalingam, S., Bouaziz, S., Sturm, P., & Brand, M. (2010). Skyline2gps: Localization in urban canyons using omni-skylines. In The IEEE/RSJ international conference on intelligent robots and systems (pp. 3816–3823).
Shan, Q., Curless, B., Furukawa, Y., Hernandez, C., & Seitz, S. M. (2014). Occluding contours for multi-view stereo. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4002–4009).
Shen, W., Wang, X., Wang, Y., Bai, X., & Zhang, Z. (2015). DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3982–3991).
Shui, P.-L., & Wang, F.-P. (2017). Anti-impulse-noise edge detection via anisotropic morphological directional derivatives. IEEE Transactions on Image Processing, 26(10), 4962–4977.
Sobel, I. (1970). Camera models and machine perception. Technical report, Stanford Univercity California, Department of Computer Science.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Takikawa, T., Acuna, D., Jampani, V., & Fidler, S. (2019). Gated-SCNN: Gated shape CNNs for semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 5229–5238).
Tang, P., Wang, X., Feng, B., & Liu, W. (2017). Learning multi-instance deep discriminative patterns for image classification. IEEE Transactions on Image Processing, 26(7), 3385–3396.
Trahanias, P. E., & Venetsanopoulos, A. N. (1993). Color edge detection using vector order statistics. IEEE Transactions on Image Processing, 2(2), 259–264.
Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 3119–3127).
Wang, Y., Zhao, X., Li, Y., & Huang, K. (2019). Deep crisp boundaries: From boundaries to higher-level tasks. IEEE Transactions on Image Processing, 28(3), 1285–1298.
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision (pp. 1395–1403).
Xie, S., & Tu, Z. (2017). Holistically-nested edge detection. International Journal of Computer Vision, 125(1–3), 3–18.
Yang, J., Price, B., Cohen, S., Lee, H., & Yang, M.-H. (2016). Object contour detection with a fully convolutional encoder–decoder network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 193–202).
Yang, W., Feng, J., Yang, J., Zhao, F., Liu, J., Guo, Z., et al. (2017). Deep edge guided recurrent residual learning for image super-resolution. IEEE Transactions on Image Processing, 26(12), 5895–5907.
Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In International conference on learning representations (pp. 1–13).
Yu, Z., Feng, C., Liu, M.-Y., & Ramalingam, S. (2017). CASENet: Deep category-aware semantic edge detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5964–5973).
Yu, Z., Liu, W., Zou, Y., Feng, C., Ramalingam, S., Kumar, B., et al. (2018). Simultaneous edge alignment and learning. In European conference on computer vision (pp. 400–417).
Zamir, A. R., Sax, A., Shen, W., Guibas, L., Malik, J., & Savarese, S. (2018). Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3712–3722).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Yuri Boykov.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported by the National Key Research and Development Program of China Grant No. 2018AAA0100400 and NSFC (No. 61922046)
Rights and permissions
About this article
Cite this article
Liu, Y., Cheng, MM., Fan, DP. et al. Semantic Edge Detection with Diverse Deep Supervision. Int J Comput Vis 130, 179–198 (2022). https://doi.org/10.1007/s11263-021-01539-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01539-8