A Decomposable Winograd Method for N–D Convolution Acceleration in Video Analysis

Huang, Di; Zhang, Rui; Zhang, Xishan; Wu, Fan; Wang, Xianzhuo; Jin, Pengwei; Liu, Shaoli; Li, Ling; Chen, Yunji

doi:10.1007/s11263-021-01500-9

A Decomposable Winograd Method for N–D Convolution Acceleration in Video Analysis

Published: 04 August 2021

Volume 129, pages 2806–2826, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Di Huang^1,3,4,
Rui Zhang ORCID: orcid.org/0000-0001-8691-8549^1,4,
Xishan Zhang^1,4,
Fan Wu^2,3,4,
Xianzhuo Wang^1,3,4,
Pengwei Jin^1,3,4,
Shaoli Liu⁴,
Ling Li² &
…
Yunji Chen^1,3

597 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Winograd’s minimal filtering algorithm has been widely used in 2-D Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as \(3\) and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problems for kernel size larger than \(3\) and fails on convolution with stride larger than 1. Worse, the extension to N–D convolution will intensify the numerical accuracy problem. These problems severely obstruct Winograd’s minimal filtering algorithm’s application to video analysis. In this paper, we propose a novel Decomposable Winograd Method (DWM) for the N–D convolution acceleration, which breaks through the limitation of original Winograd’s minimal filtering algorithm to more general convolutions. DWM decomposes kernels with large size or stride>1 to several small kernels with stride as 1 for further applying Winograd algorithm, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploration of larger kernel size, larger stride value, and higher dimensions in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd algorithm, the proposed DWM is able to support all kinds of N–D convolutions with a speedup of \(1.44\times \)–\(3.38\times \), without affecting the numerical accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reducing the Number of Multiplications in Convolutional Recurrent Neural Networks (ConvRNNs)

Winograd Algorithm for 3D Convolution Neural Networks

A Method for Enhancing the Quality of Compressed Videos Based on 2D Convolution and Aggregating Spatio-Temporal Information

Notes

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat ,S., Irving, G., Isard, M., et al. (2016). Tensorflow: A system for large-scale machine learning. In 12th\(\{\)USENIX\(\}\)symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16) (pp. 265–283)
Barabasz, B,, & Gregg, D. (2019). Winograd convolution for dnns: Beyond linear polinomials. arXiv:1905.05233.
Budden, D., Matveev, A., Santurkar, S., Chaudhuri, S. R., & Shavit, N. (2017) Deep tensor convolution on multicores. In Proceedings of the 34th international conference on machine learning-volume 70, JMLR. org (pp. 615–624).
Butler, D., Wulff, J., Stanley, G., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In ECCV
Cai, H., Zhu, L., & Han, S. (2018). Proxylessnas: Direct neural architecture search on target task and hardware. arXiv:1812.00332.
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In CVPR. IEEE Computer Society (pp. 4724–4733).
Chang, Y., Liu, Z. Y., Lee, K. Y., & Hsu, W. (2019). Free-form video inpainting with 3d gated convolution and temporal patchgan. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 9065–9074)
Chen, Y., Fan, H., Xu, B., Yan, Z., Kalantidis, Y., Rohrbach, M., Yan, S., & Feng, J. (2019). Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 3434–3443).
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). Cudnn: Efficient primitives for deep learning. arXiv:1410.0759.
Choy, C., Gwak, J., & Savarese, S. (2019) 4d Spatio-temporal convnets: Minkowski convolutional neural networks. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3070–3079)
Courbariaux, M., Bengio, Y., & David, J. (2014). Training deep neural networks with low precision multiplications. arXiv:1412.7024.
Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016) Convolutional two-stream network fusion for video action recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1933–1941).
Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp 6201–6210)
Guo, J., Ouyang, W., & Xu, D. (2020a). Channel pruning guided by classification loss and feature importance. arXiv:2003.06757.
Guo, J., Ouyang, W., & Xu, D. (2020b). Multi-dimensional pruning: A unified framework for model compression. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1505–1514).
Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep learning with limited numerical precision. In ICML
Han, S., Mao, H., & Dally, W. (2016) Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149.
Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3D CNNS retrace the history of 2D CNNS and imagenet? In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 6546–6555).
Harris, C., Millman, K. J., Walt, S., Gommers, R., Virtanen, P., Cournapeau, D., et al. (2020). Array programming with Numpy. Nature, 585(7825), 357–362.
Article Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Huang, D., Zhang, X., Zhang, R., Zhi, T., He, D., Guo, J., Liu, C., Guo, Q., Du, Z., Liu, S., Chen, T., & Chen, Y. (2020). DWM: A decomposable winograd method for convolution acceleration. In AAAI. AAAI Press (pp. 4174–4181).
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 2704–2713).
Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. arXiv:1405.3866.
Jia, L., Liang, Y., Li, X., Lu, L., & Yan, S. (2020). Enabling efficient fast convolution algorithms on GPUs via MegaKernels. IEEE Transactions on Computers, 69, 986–997.
MathSciNet MATH Google Scholar
Lan, Q., Wang, Z., Wen, M., Zhang, C., & Wang, Y. (2017). High performance implementation of 3D convolutional neural networks on a GPU. Computational Intelligence and Neuroscience, 8348671(1–8348671), 8.
Google Scholar
Lavin, A., & Gray, S. (2016). Fast algorithms for convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4013–4021).
Liu, H., Simonyan, K., & Yang, Y. (2019). Darts: Differentiable architecture search. arXiv:1806.09055.
Liu, J., Zhou, S., Wu, Y., Chen, K., Ouyang, W., & Xu, D. (2021). Block proposal neural architecture search. IEEE Transactions on Image Processing, 30, 15–25.
Article Google Scholar
Liu, X., Pool, J., Han, S., & Dally, W. J. (2018). Efficient sparse-winograd convolutional neural networks. arXiv:1802.06367.
Lu, L., Liang, Y., Xiao, Q., & Yan, S. (2017). Evaluating fast algorithms for convolutional neural networks on FPGAs. In 2017 IEEE 25th annual international symposium on field-programmable custom computing machines (FCCM). IEEE (pp. 101–108).
Maji, P., Mundy, A., Dasika, G., Beu, J., Mattina, M., & Mullins, R. (2019). Efficient winograd or cook-toom convolution kernel implementation on widely used mobile cpus. arXiv:1903.01521
Meng, L., & Brothers, J. (2019). Efficient winograd convolution via integer arithmetic. arXiv:1901.01965.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch. In NIPS autodiff workshop
Piergiovanni, A., Angelova, A., Toshev, A., & Ryoo, M. (2019). Evolving space-time neural architectures for videos. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 1793–1802).
Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2720–2729).
Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In AAAI
Shi, L., Zhang, Y. F., Cheng, J., & Lu, H. (2020). Sc4d: A sparse 4d convolutional network for skeleton-based action recognition. arXiv:2004.03259.
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In NIPS
Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In ICCV. IEEE Computer Society (pp. 4489–4497).
Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., & Paluri, M. (2016). Deep end2end voxel2voxel prediction. In 2016 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 402–409).
Tran, D., Ray, J,. Shou, Z., Chang, S., & Paluri, M. (2017). Convnet architecture search for spatiotemporal feature learning. arXiv:1708.05038.
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In CVPR. IEEE Computer Society (pp. 6450–6459).
Vincent, K., Stephano, K., Frumkin, M., Ginsburg, B., & Demouth, J. (2017). On improving the numerical stability of winograd convolutions. In ICLR.
Wang, C., Huang, H., Han, X., & Wang, J. (2019a). Video inpainting by jointly learning temporal structure and spatial details. arXiv:1806.08482.
Wang, K., Liu, Z., Lin ,Y., Lin, J., & Han, S. (2019b). Haq: Hardware-aware automated quantization with mixed precision. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8604–8612).
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. arXiv:1608.00859.
Wang, Z., Lan, Q., He, H., & Zhang, C. (2017). Winograd algorithm for 3d convolution neural networks. Lecture Notes in Computer Science (pp. 609–616). ICANN (2): Springer.
Wei, H., Liu, E., Zhao, Y., & Yu, H. (2020). Efficient non-fused winograd on GPUs. In CGI.
Winograd, S. (1980). Arithmetic complexity of computations, vol 33. SIAM.
Wu, J., Leng, C., Wang, Y., Hu, Q., & Cheng, J. (2016). Quantized convolutional neural networks for mobile devices. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4820–4828).
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D shapenets: A deep representation for volumetric shapes. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1912–1920).
Xue, J., Li, J., & Gong, Y. (2013). Restructuring of deep neural network acoustic models with singular value decomposition. In INTERSPEECH
Xue, T., Chen, B., Wu, J., Wei, D., & Freeman, W. (2018) Video enhancement with task-oriented flow. International Journal of Computer Vision 1–20
Yan, D., Wang, W., & Chu, X. (2020). Optimizing batched winograd convolution on GPUs. In Proceedings of the 25th ACM SIGPLAN symposium on principles and practice of parallel programming
Zhang, S., Guo, S., Huang, W., Scott, M., & Wang, L. (2020a). V4d: 4d convolutional neural networks for video-level representation learning. arXiv:2002.07442.
Zhang, X., Liu, S., Zhang, R., Liu, C., Huang ,D., yi Zhou, S., Guo, J., Guo, Q., Du, Z., Zhi, T., & Chen, Y. (2020b). Fixed-point back-propagation training. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 2327–2335).
xin Zhang, Y., Wang, H., Luo, Y., & Hu, R. (2019). Three-dimensional convolutional neural network pruning with regularization-based method. In 2019 IEEE international conference on image processing (ICIP) (pp. 4270–4274).
Zhou, A., Yao, A., Guo, Y., Xu, L., & Chen, Y. (2017). Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv:1702.03044.
Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., & Zou, Y. (2016). Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160.
Zhu, F., Gong, R., Yu, F., Liu, X., Wang, Y., Li, Z., Yang, X., & Yan, J. (2020). Towards unified int8 training for convolutional neural network. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1966–1976).
Zolfaghari, M., Singh, K., & Brox, T. (2018) Eco: Efficient convolutional network for online video understanding. arXiv:1804.09066.
Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. arXiv:1611.01578.

Download references

Acknowledgements

This work is partially supported by the Beijing Natural Science Foundation (JQ18013), the NSF of China (under Grants 61925208, 61906179, 62002338, 61732007, 61732002, U19B2019, U20A20227), Strategic Priority Research Program of Chinese Academy of Science (XDB 32050200, XDC05010300), Beijing Academy of Artificial Intelligence (BAAI) and Beijing Nova Program of Science and Technology (Z191100001119093), Youth Innovation Promotion Association CAS and Xplore Prize.

Author information

Authors and Affiliations

SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Di Huang, Rui Zhang, Xishan Zhang, Xianzhuo Wang, Pengwei Jin & Yunji Chen
Institute of Software, CAS, Beijing, China
Fan Wu & Ling Li
University of Chinese Academy of Sciences, Beijing, China
Di Huang, Fan Wu, Xianzhuo Wang, Pengwei Jin & Yunji Chen
Cambricon Tech. Ltd, Beijing, China
Di Huang, Rui Zhang, Xishan Zhang, Fan Wu, Xianzhuo Wang, Pengwei Jin & Shaoli Liu

Authors

Di Huang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xishan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xianzhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pengwei Jin
View author publications
You can also search for this author in PubMed Google Scholar
Shaoli Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ling Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunji Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Zhang.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, D., Zhang, R., Zhang, X. et al. A Decomposable Winograd Method for N–D Convolution Acceleration in Video Analysis. Int J Comput Vis 129, 2806–2826 (2021). https://doi.org/10.1007/s11263-021-01500-9

Download citation

Received: 15 December 2020
Accepted: 05 July 2021
Published: 04 August 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11263-021-01500-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Decomposable Winograd Method for N–D Convolution Acceleration in Video Analysis

Abstract

Access this article

Similar content being viewed by others

Reducing the Number of Multiplications in Convolutional Recurrent Neural Networks (ConvRNNs)

Winograd Algorithm for 3D Convolution Neural Networks

A Method for Enhancing the Quality of Compressed Videos Based on 2D Convolution and Aggregating Spatio-Temporal Information

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Decomposable Winograd Method for N–D Convolution Acceleration in Video Analysis

Abstract

Access this article

Similar content being viewed by others

Reducing the Number of Multiplications in Convolutional Recurrent Neural Networks (ConvRNNs)

Winograd Algorithm for 3D Convolution Neural Networks

A Method for Enhancing the Quality of Compressed Videos Based on 2D Convolution and Aggregating Spatio-Temporal Information

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation