Skip to main content
Log in

A Decomposable Winograd Method for N–D Convolution Acceleration in Video Analysis

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Winograd’s minimal filtering algorithm has been widely used in 2-D Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as \(3\) and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problems for kernel size larger than \(3\) and fails on convolution with stride larger than 1. Worse, the extension to N–D convolution will intensify the numerical accuracy problem. These problems severely obstruct Winograd’s minimal filtering algorithm’s application to video analysis. In this paper, we propose a novel Decomposable Winograd Method (DWM) for the N–D convolution acceleration, which breaks through the limitation of original Winograd’s minimal filtering algorithm to more general convolutions. DWM decomposes kernels with large size or stride>1 to several small kernels with stride as 1 for further applying Winograd algorithm, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploration of larger kernel size, larger stride value, and higher dimensions in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd algorithm, the proposed DWM is able to support all kinds of N–D convolutions with a speedup of \(1.44\times \)\(3.38\times \), without affecting the numerical accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://docs.nvidia.com/cuda/profiler-users-guide/index.html.

  2. https://github.com/Lyken17/pytorch-OpCounter.

  3. https://github.com/Coldog2333/pytoflow.

  4. https://github.com/sniklaus/pytorch-spynet.

  5. https://github.com/facebookresearch/SlowFast.

  6. https://github.com/facebookresearch/SlowFast/issues/256.

  7. https://github.com/zhshi0816/GDConvNet.

  8. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html.

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat ,S., Irving, G., Isard, M., et al. (2016). Tensorflow: A system for large-scale machine learning. In 12th\(\{\)USENIX\(\}\)symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16) (pp. 265–283)

  • Barabasz, B,, & Gregg, D. (2019). Winograd convolution for dnns: Beyond linear polinomials. arXiv:1905.05233.

  • Budden, D., Matveev, A., Santurkar, S., Chaudhuri, S. R., & Shavit, N. (2017) Deep tensor convolution on multicores. In Proceedings of the 34th international conference on machine learning-volume 70, JMLR. org (pp. 615–624).

  • Butler, D., Wulff, J., Stanley, G., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In ECCV

  • Cai, H., Zhu, L., & Han, S. (2018). Proxylessnas: Direct neural architecture search on target task and hardware. arXiv:1812.00332.

  • Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In CVPR. IEEE Computer Society (pp. 4724–4733).

  • Chang, Y., Liu, Z. Y., Lee, K. Y., & Hsu, W. (2019). Free-form video inpainting with 3d gated convolution and temporal patchgan. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 9065–9074)

  • Chen, Y., Fan, H., Xu, B., Yan, Z., Kalantidis, Y., Rohrbach, M., Yan, S., & Feng, J. (2019). Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 3434–3443).

  • Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). Cudnn: Efficient primitives for deep learning. arXiv:1410.0759.

  • Choy, C., Gwak, J., & Savarese, S. (2019) 4d Spatio-temporal convnets: Minkowski convolutional neural networks. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3070–3079)

  • Courbariaux, M., Bengio, Y., & David, J. (2014). Training deep neural networks with low precision multiplications. arXiv:1412.7024.

  • Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016) Convolutional two-stream network fusion for video action recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1933–1941).

  • Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp 6201–6210)

  • Guo, J., Ouyang, W., & Xu, D. (2020a). Channel pruning guided by classification loss and feature importance. arXiv:2003.06757.

  • Guo, J., Ouyang, W., & Xu, D. (2020b). Multi-dimensional pruning: A unified framework for model compression. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1505–1514).

  • Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep learning with limited numerical precision. In ICML

  • Han, S., Mao, H., & Dally, W. (2016) Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149.

  • Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3D CNNS retrace the history of 2D CNNS and imagenet? In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 6546–6555).

  • Harris, C., Millman, K. J., Walt, S., Gommers, R., Virtanen, P., Cournapeau, D., et al. (2020). Array programming with Numpy. Nature, 585(7825), 357–362.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Huang, D., Zhang, X., Zhang, R., Zhi, T., He, D., Guo, J., Liu, C., Guo, Q., Du, Z., Liu, S., Chen, T., & Chen, Y. (2020). DWM: A decomposable winograd method for convolution acceleration. In AAAI. AAAI Press (pp. 4174–4181).

  • Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 2704–2713).

  • Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. arXiv:1405.3866.

  • Jia, L., Liang, Y., Li, X., Lu, L., & Yan, S. (2020). Enabling efficient fast convolution algorithms on GPUs via MegaKernels. IEEE Transactions on Computers, 69, 986–997.

    MathSciNet  MATH  Google Scholar 

  • Lan, Q., Wang, Z., Wen, M., Zhang, C., & Wang, Y. (2017). High performance implementation of 3D convolutional neural networks on a GPU. Computational Intelligence and Neuroscience, 8348671(1–8348671), 8.

    Google Scholar 

  • Lavin, A., & Gray, S. (2016). Fast algorithms for convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4013–4021).

  • Liu, H., Simonyan, K., & Yang, Y. (2019). Darts: Differentiable architecture search. arXiv:1806.09055.

  • Liu, J., Zhou, S., Wu, Y., Chen, K., Ouyang, W., & Xu, D. (2021). Block proposal neural architecture search. IEEE Transactions on Image Processing, 30, 15–25.

    Article  Google Scholar 

  • Liu, X., Pool, J., Han, S., & Dally, W. J. (2018). Efficient sparse-winograd convolutional neural networks. arXiv:1802.06367.

  • Lu, L., Liang, Y., Xiao, Q., & Yan, S. (2017). Evaluating fast algorithms for convolutional neural networks on FPGAs. In 2017 IEEE 25th annual international symposium on field-programmable custom computing machines (FCCM). IEEE (pp. 101–108).

  • Maji, P., Mundy, A., Dasika, G., Beu, J., Mattina, M., & Mullins, R. (2019). Efficient winograd or cook-toom convolution kernel implementation on widely used mobile cpus. arXiv:1903.01521

  • Meng, L., & Brothers, J. (2019). Efficient winograd convolution via integer arithmetic. arXiv:1901.01965.

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch. In NIPS autodiff workshop

  • Piergiovanni, A., Angelova, A., Toshev, A., & Ryoo, M. (2019). Evolving space-time neural architectures for videos. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 1793–1802).

  • Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2720–2729).

  • Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In AAAI

  • Shi, L., Zhang, Y. F., Cheng, J., & Lu, H. (2020). Sc4d: A sparse 4d convolutional network for skeleton-based action recognition. arXiv:2004.03259.

  • Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In NIPS

  • Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In ICCV. IEEE Computer Society (pp. 4489–4497).

  • Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., & Paluri, M. (2016). Deep end2end voxel2voxel prediction. In 2016 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 402–409).

  • Tran, D., Ray, J,. Shou, Z., Chang, S., & Paluri, M. (2017). Convnet architecture search for spatiotemporal feature learning. arXiv:1708.05038.

  • Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In CVPR. IEEE Computer Society (pp. 6450–6459).

  • Vincent, K., Stephano, K., Frumkin, M., Ginsburg, B., & Demouth, J. (2017). On improving the numerical stability of winograd convolutions. In ICLR.

  • Wang, C., Huang, H., Han, X., & Wang, J. (2019a). Video inpainting by jointly learning temporal structure and spatial details. arXiv:1806.08482.

  • Wang, K., Liu, Z., Lin ,Y., Lin, J., & Han, S. (2019b). Haq: Hardware-aware automated quantization with mixed precision. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8604–8612).

  • Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. arXiv:1608.00859.

  • Wang, Z., Lan, Q., He, H., & Zhang, C. (2017). Winograd algorithm for 3d convolution neural networks. Lecture Notes in Computer Science (pp. 609–616). ICANN (2): Springer.

  • Wei, H., Liu, E., Zhao, Y., & Yu, H. (2020). Efficient non-fused winograd on GPUs. In CGI.

  • Winograd, S. (1980). Arithmetic complexity of computations, vol 33. SIAM.

  • Wu, J., Leng, C., Wang, Y., Hu, Q., & Cheng, J. (2016). Quantized convolutional neural networks for mobile devices. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4820–4828).

  • Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D shapenets: A deep representation for volumetric shapes. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1912–1920).

  • Xue, J., Li, J., & Gong, Y. (2013). Restructuring of deep neural network acoustic models with singular value decomposition. In INTERSPEECH

  • Xue, T., Chen, B., Wu, J., Wei, D., & Freeman, W. (2018) Video enhancement with task-oriented flow. International Journal of Computer Vision 1–20

  • Yan, D., Wang, W., & Chu, X. (2020). Optimizing batched winograd convolution on GPUs. In Proceedings of the 25th ACM SIGPLAN symposium on principles and practice of parallel programming

  • Zhang, S., Guo, S., Huang, W., Scott, M., & Wang, L. (2020a). V4d: 4d convolutional neural networks for video-level representation learning. arXiv:2002.07442.

  • Zhang, X., Liu, S., Zhang, R., Liu, C., Huang ,D., yi Zhou, S., Guo, J., Guo, Q., Du, Z., Zhi, T., & Chen, Y. (2020b). Fixed-point back-propagation training. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 2327–2335).

  • xin Zhang, Y., Wang, H., Luo, Y., & Hu, R. (2019). Three-dimensional convolutional neural network pruning with regularization-based method. In 2019 IEEE international conference on image processing (ICIP) (pp. 4270–4274).

  • Zhou, A., Yao, A., Guo, Y., Xu, L., & Chen, Y. (2017). Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv:1702.03044.

  • Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., & Zou, Y. (2016). Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160.

  • Zhu, F., Gong, R., Yu, F., Liu, X., Wang, Y., Li, Z., Yang, X., & Yan, J. (2020). Towards unified int8 training for convolutional neural network. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1966–1976).

  • Zolfaghari, M., Singh, K., & Brox, T. (2018) Eco: Efficient convolutional network for online video understanding. arXiv:1804.09066.

  • Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. arXiv:1611.01578.

Download references

Acknowledgements

This work is partially supported by the Beijing Natural Science Foundation (JQ18013), the NSF of China (under Grants 61925208, 61906179, 62002338, 61732007, 61732002, U19B2019, U20A20227), Strategic Priority Research Program of Chinese Academy of Science (XDB 32050200, XDC05010300), Beijing Academy of Artificial Intelligence (BAAI) and Beijing Nova Program of Science and Technology (Z191100001119093), Youth Innovation Promotion Association CAS and Xplore Prize.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Zhang.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, D., Zhang, R., Zhang, X. et al. A Decomposable Winograd Method for N–D Convolution Acceleration in Video Analysis. Int J Comput Vis 129, 2806–2826 (2021). https://doi.org/10.1007/s11263-021-01500-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01500-9

Keywords

Navigation