Image content-dependent steerable kernels

Ye, Xiang; Wang, Heng; Li, Yong

doi:10.1007/s00371-021-02128-z

Image content-dependent steerable kernels

Original article
Published: 26 April 2021

Volume 38, pages 2527–2538, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xiang Ye¹,
Heng Wang¹ &
Yong Li¹

252 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Attention mechanism plays an essential role in many tasks such as image classification, object detection, and instance segmentation. However, existing methods typically assigned attention weights to feature maps of the previous layer. The kernels in current layer remained static during the inference stage. To explicitly model the dependency of individual kernel weights on image content at the inference stage, this work proposed attention weight block (AWB) that converts kernels to be steerable to the content in a test image. Specifically, AWB computes a set of on-the-fly coefficients according to the feature maps of the previous layer and applies the coefficients to the kernels in current layers, which makes them steerable. AWB kernels emphasize or suppress the weights of certain kernels depending on the content of input samples and hence significantly improve the feature representation ability of deep neural networks. The proposed AWB is evaluated on various datasets, and experimental results show that steerable kernels in AWB outperformed the state-of-the-art attention approaches when embedded in the architecture for classification, object detection, and semantic segmentation tasks. It outperforms ECA by 1.1% and 1.0% on CIFAR-100 and Tiny ImageNet datasets, respectively, for image classification task; outperforms CornerNet-Lite by 1.5% on COCO2017 dataset for object detection task; and outperforms FCN8s by 1.2% on SBUshadow dataset for semantic segmentation task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

References

Sharma, P.K., Basavaraju, S., Sur, A.: Deep learning-based image de-raining using discrete fourier transformation. Vis. Comput., pp. 1–14 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IVPR, pp. 770–778 (2016)
Li, Y., Ye, X., Li, Y.: Image quality assessment using deep convolutional networks. AIP Adv. 7, 125324 (2017)
Article Google Scholar
Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)
Zhang, S., He, F.: DRCDN: learning deep residual convolutional dehazing networks. Vis. Comput. 36(9), 1797–1808 (2020)
Article Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arxiv:1605.07146
Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks (2016). arxiv:1608.06993
Wang, D., Hu, G., Lyu, C.: Frnet: an end-to-end feature refinement neural network for medical image segmentation. Vis. Comput., pp. 1–12 (2020)
Wang, F., Jiang, M., Qian, ., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification (2017). arxiv:1704.06904
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arxiv:1709.01507
Cai, J., Jianguo, H.: 3D RANS: 3D residual attention networks for action recognition. Vis. Comput. 36(6), 1261–1270 (2020)
Article Google Scholar
Liu, Z., Duan, Q., Shi, S., Zhao, P.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis. Comput., pp. 1–12 (2020)
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., Glocker, B.: Attention u-net: learning where to look for the pancreas (2018). arxiv:1804.03999
Schlemper, J., Oktay, O., Schaap, M., Heinrich, M., Kainz, B., Glocker, B., Rueckert, D.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2018)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision-ECCV 2016, pp. 21–37. Springer, Cham (2016)
Chapter Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015)
Article Google Scholar
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2015)
Article Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015). arxiv:1505.00387
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : Deconvolutional single shot detector (2017). arxiv:1701.06659
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation (2016). arxiv:1603.06937
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector (2017). arxiv:1712.00960
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection (2016). arxiv:1612.03144
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., Ling, H.: M2det: A single-shot object detector based on multi-level feature pyramid network (2018). arxiv:1811.04533
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention (2015). arxiv:1502.03044
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module (2018). arxiv:1807.06521
Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel squeeze and excitation in fully convolutional networks (2018). arxiv:1803.02579
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition (2018). arxiv:1806.05372
Ma, J., Li, X., Li, H., Menze, B.H., Liang, S., Zhang, R., Zheng, W.S.: Group-attention single-shot detector (GA-SSD): finding pulmonary nodules in large-scale CT images (2018). arxiv:1812.07166
Wang, L., Wu, Z., Karanam, S., Peng, K.C., Singh, R.V., Liu, B., Metaxas, D.N.: Reducing visual confusion with discriminative attention (2018). arxiv:1811.07484
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention (2014). arxiv:1406.6247
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks (2018). arxiv:1807.02758
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp. 448–456. JMLR.org (2015)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Tichnical report (2009)
Yao, L., Miller, J.: Tiny imagenet classification with convolutional neural networks. CS 231N 2(5), 8 (2015)
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S. and Schiele, B.: The cityscapes dataset for semantic urban scene understanding (2016). arxiv:1604.01685
Vicente, T.F.Y., Hou, L., Yu, C.-P., Hoai, M., Samaras, D.: Large-scale training of shadow detectors with noisily-annotated shadow examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision–ECCV 2016, pp. 816–832. Springer, Cham (2016)
Chapter Google Scholar
Law, H., Teng, Y., Russakovsky, O., Deng, J.: Cornernet-lite: efficient keypoint based object detection (2019). arxiv:1904.08900
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1 hour (2017). arXiv preprint arXiv:1706.02677
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation (2014). arxiv:1411.4038
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network (2016). arxiv:1612.01105
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arxiv:1706.05587

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 62071060) and the Beijing Key Laboratory of Work Safety and Intelligent Monitoring Foundation. We would like to thank Huachun Tan for helpful discussion and Zihang He for proof-reading the manuscript.

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Xiang Ye, Heng Wang & Yong Li

Authors

Xiang Ye
View author publications
You can also search for this author in PubMed Google Scholar
Heng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Li.

Ethics declarations

Conflict of interest

No competing financial and non-financial interests exist.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, X., Wang, H. & Li, Y. Image content-dependent steerable kernels. Vis Comput 38, 2527–2538 (2022). https://doi.org/10.1007/s00371-021-02128-z

Download citation

Accepted: 28 March 2021
Published: 26 April 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00371-021-02128-z

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Image content-dependent steerable kernels

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

Image content-dependent steerable kernels

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation