Skip to main content
Log in

Image content-dependent steerable kernels

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Attention mechanism plays an essential role in many tasks such as image classification, object detection, and instance segmentation. However, existing methods typically assigned attention weights to feature maps of the previous layer. The kernels in current layer remained static during the inference stage. To explicitly model the dependency of individual kernel weights on image content at the inference stage, this work proposed attention weight block (AWB) that converts kernels to be steerable to the content in a test image. Specifically, AWB computes a set of on-the-fly coefficients according to the feature maps of the previous layer and applies the coefficients to the kernels in current layers, which makes them steerable. AWB kernels emphasize or suppress the weights of certain kernels depending on the content of input samples and hence significantly improve the feature representation ability of deep neural networks. The proposed AWB is evaluated on various datasets, and experimental results show that steerable kernels in AWB outperformed the state-of-the-art attention approaches when embedded in the architecture for classification, object detection, and semantic segmentation tasks. It outperforms ECA by 1.1% and 1.0% on CIFAR-100 and Tiny ImageNet datasets, respectively, for image classification task; outperforms CornerNet-Lite by 1.5% on COCO2017 dataset for object detection task; and outperforms FCN8s by 1.2% on SBUshadow dataset for semantic segmentation task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Sharma, P.K., Basavaraju, S., Sur, A.: Deep learning-based image de-raining using discrete fourier transformation. Vis. Comput., pp. 1–14 (2020)

  2. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IVPR, pp. 770–778 (2016)

  3. Li, Y., Ye, X., Li, Y.: Image quality assessment using deep convolutional networks. AIP Adv. 7, 125324 (2017)

    Article  Google Scholar 

  4. Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)

  5. Zhang, S., He, F.: DRCDN: learning deep residual convolutional dehazing networks. Vis. Comput. 36(9), 1797–1808 (2020)

    Article  Google Scholar 

  6. Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arxiv:1605.07146

  7. Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks (2016). arxiv:1608.06993

  8. Wang, D., Hu, G., Lyu, C.: Frnet: an end-to-end feature refinement neural network for medical image segmentation. Vis. Comput., pp. 1–12 (2020)

  9. Wang, F., Jiang, M., Qian, ., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification (2017). arxiv:1704.06904

  10. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arxiv:1709.01507

  11. Cai, J., Jianguo, H.: 3D RANS: 3D residual attention networks for action recognition. Vis. Comput. 36(6), 1261–1270 (2020)

    Article  Google Scholar 

  12. Liu, Z., Duan, Q., Shi, S., Zhao, P.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis. Comput., pp. 1–12 (2020)

  13. Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., Glocker, B.: Attention u-net: learning where to look for the pancreas (2018). arxiv:1804.03999

  14. Schlemper, J., Oktay, O., Schaap, M., Heinrich, M., Kainz, B., Glocker, B., Rueckert, D.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2018)

    Article  Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)

    Article  Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

  17. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision-ECCV 2016, pp. 21–37. Springer, Cham (2016)

    Chapter  Google Scholar 

  18. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015)

    Article  Google Scholar 

  19. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

  20. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)

  21. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2015)

    Article  Google Scholar 

  22. Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015). arxiv:1505.00387

  23. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : Deconvolutional single shot detector (2017). arxiv:1701.06659

  24. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation (2016). arxiv:1603.06937

  25. Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector (2017). arxiv:1712.00960

  26. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection (2016). arxiv:1612.03144

  27. Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., Ling, H.: M2det: A single-shot object detector based on multi-level feature pyramid network (2018). arxiv:1811.04533

  28. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention (2015). arxiv:1502.03044

  29. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module (2018). arxiv:1807.06521

  30. Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel squeeze and excitation in fully convolutional networks (2018). arxiv:1803.02579

  31. Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition (2018). arxiv:1806.05372

  32. Ma, J., Li, X., Li, H., Menze, B.H., Liang, S., Zhang, R., Zheng, W.S.: Group-attention single-shot detector (GA-SSD): finding pulmonary nodules in large-scale CT images (2018). arxiv:1812.07166

  33. Wang, L., Wu, Z., Karanam, S., Peng, K.C., Singh, R.V., Liu, B., Metaxas, D.N.: Reducing visual confusion with discriminative attention (2018). arxiv:1811.07484

  34. Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention (2014). arxiv:1406.6247

  35. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)

  36. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks (2018). arxiv:1807.02758

  37. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp. 448–456. JMLR.org (2015)

  38. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Tichnical report (2009)

  39. Yao, L., Miller, J.: Tiny imagenet classification with convolutional neural networks. CS 231N 2(5), 8 (2015)

    Google Scholar 

  40. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)

  41. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

  42. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S. and Schiele, B.: The cityscapes dataset for semantic urban scene understanding (2016). arxiv:1604.01685

  43. Vicente, T.F.Y., Hou, L., Yu, C.-P., Hoai, M., Samaras, D.: Large-scale training of shadow detectors with noisily-annotated shadow examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision–ECCV 2016, pp. 816–832. Springer, Cham (2016)

    Chapter  Google Scholar 

  44. Law, H., Teng, Y., Russakovsky, O., Deng, J.: Cornernet-lite: efficient keypoint based object detection (2019). arxiv:1904.08900

  45. Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1 hour (2017). arXiv preprint arXiv:1706.02677

  46. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934

  47. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation (2014). arxiv:1411.4038

  48. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network (2016). arxiv:1612.01105

  49. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arxiv:1706.05587

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 62071060) and the Beijing Key Laboratory of Work Safety and Intelligent Monitoring Foundation. We would like to thank Huachun Tan for helpful discussion and Zihang He for proof-reading the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Li.

Ethics declarations

Conflict of interest

No competing financial and non-financial interests exist.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, X., Wang, H. & Li, Y. Image content-dependent steerable kernels. Vis Comput 38, 2527–2538 (2022). https://doi.org/10.1007/s00371-021-02128-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02128-z

Navigation