Make $$\ell _1$$ regularization effective in training sparse CNN

He, Juncai; Jia, Xiaodong; Xu, Jinchao; Zhang, Lian; Zhao, Liang

doi:10.1007/s10589-020-00202-1

Make $\ell _1$ regularization effective in training sparse CNN

Published: 04 July 2020

Volume 77, pages 163–182, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Juncai He¹,
Xiaodong Jia²,
Jinchao Xu¹,
Lian Zhang ORCID: orcid.org/0000-0001-5659-0167¹ &
…
Liang Zhao³

629 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Compressed Sensing using $\ell _1$ regularization is among the most powerful and popular sparsification technique in many applications, but why has it not been used to obtain sparse deep learning model such as convolutional neural network (CNN)? This paper is aimed to provide an answer to this question and to show how to make it work. Following Xiao (J Mach Learn Res 11(Oct):2543–2596, 2010), We first demonstrate that the commonly used stochastic gradient decent and variants training algorithm is not an appropriate match with $\ell _1$ regularization and then replace it with a different training algorithm based on a regularized dual averaging (RDA) method. The RDA method of Xiao (J Mach Learn Res 11(Oct):2543–2596, 2010) was originally designed specifically for convex problem, but with new theoretical insight and algorithmic modifications (using proper initialization and adaptivity), we have made it an effective match with $\ell _1$ regularization to achieve a state-of-the-art sparsity for the highly non-convex CNN compared to other weight pruning methods without compromising accuracy (achieving 95% sparsity for ResNet-18 on CIFAR-10, for example).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based PET image denoising and reconstruction: a review

Article Open access 06 February 2024

Fumio Hashimoto, Yuya Onishi, … Taiga Yamaya

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

Article Open access 13 March 2024

Weirong Liu, Min Zhang, … Jie Liu

Super-resolution techniques for biomedical applications and challenges

Article 19 March 2024

Minwoo Shin, Minjee Seo, … Kyungho Yoon

Notes

In the original paper [38], RDA is proposed as an online learning algorithm, which takes one input at each time.

References

Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: Advances in Neural Information Processing Systems, pp. 2270–2278 (2016)
Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. 129, 163 (2011)
Article MathSciNet MATH Google Scholar
Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Article MathSciNet MATH Google Scholar
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks (2017). arXiv preprint arXiv:1710.09282
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10(Dec), 2899–2934 (2009)
MathSciNet MATH Google Scholar
Eldar, Y.C., Kutyniok, G.: Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding (2015). arXiv preprint arXiv:1510.00149
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)
Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: optimal brain surgeon. In: Advances in Neural Information Processing Systems, pp. 164–171 (1993)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: International Conference on Computer Vision (ICCV), vol. 2 (2017)
Hu, H., Peng, R., Tai, Y.-W., Tang, C.-K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures (2016). arXiv preprint arXiv:1607.03250
Huang, Z., Wang, N.: Data-driven sparse structure selection for deep neural networks (2017). arXiv preprint arXiv:1707.01213
Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10(2), 777–801 (2009)
MathSciNet MATH Google Scholar
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backprop. In: Montavon, G., Orr, G., Müller, K.R. (eds.) Neural Networks: Tricks of the Trade, pp. 9–48. Springer, Berlin (2012)
Chapter Google Scholar
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets (2016). arXiv preprint arXiv:1608.08710
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet MATH Google Scholar
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2755–2763. IEEE (2017)
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning (2018). arXiv preprint arXiv:1810.05270
Luo, J.-H., Wu, J., Lin, W.: Thinet: a filter level pruning method for deep neural network compression (2017). arXiv preprint arXiv:1707.06342
Lustig, M., Donoho, D., Pauly, J.M.: Sparse MRI: the application of compressed sensing for rapid MR imaging. Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med. 58(6), 1182–1195 (2007)
Article Google Scholar
McMahan, B.: Follow-the-regularized-leader and mirror descent: equivalence theorems and l1 regularization. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 525–533 (2011)
McMahan, H.B.: A survey of algorithms and analysis for adaptive online learning. J. Mach. Learn. Res. 18(1), 3117–3166 (2017)
MathSciNet MATH Google Scholar
Mine, H., Fukushima, M.: A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33(1), 9–23 (1981)
Article MathSciNet MATH Google Scholar
Mittal, D., Bhardwaj, S., Khapra, M.M., Ravindran, B.: Recovering from random pruning: on the plasticity of deep convolutional neural networks (2018)
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
Article MathSciNet MATH Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: Understanding the exploding gradient problem (2012). CoRR arXiv:abs/1211.5063
Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: International Conference on Neural Information Processing Systems, pp. 177–185 (1988)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2074–2082 (2016)
Xiao, L.: Dual averaging method for regularized stochastic learning and online optimization. In: Advances in Neural Information Processing Systems, pp. 2116–2124 (2009)
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11(Oct), 2543–2596 (2010)
MathSciNet MATH Google Scholar
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression (2017). arXiv preprint arXiv:1710.01878

Download references

Acknowledgements

This work was partially supported by the Penn State and Peking University Joint Center for Computational Mathematics and Applications, the Beijing International Center for Mathematical Research from Peking University, and the Verne M. William Professorship Fund from Penn State University. The research of L. Zhao and L. Zhang was also supported by the China Scholarship Council (for visiting Penn State) and by HKUST16301218 Hong Kong RGC Competitive Earmarked Research Grant (for visiting Penn State), respectively. The authors wish to thank Drs. Lin Xiao and Liang Yang for helpful suggestions and discussions.

Author information

Authors and Affiliations

Department of Mathematics, Pennsylvania State University, University Park, PA, 16802, USA
Juncai He, Jinchao Xu & Lian Zhang
Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA
Xiaodong Jia
State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, 100190, China
Liang Zhao

Authors

Juncai He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jinchao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinchao Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, J., Jia, X., Xu, J. et al. Make $\ell _1$ regularization effective in training sparse CNN. Comput Optim Appl 77, 163–182 (2020). https://doi.org/10.1007/s10589-020-00202-1

Download citation

Received: 24 August 2019
Published: 04 July 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10589-020-00202-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Make \(\ell _1\) regularization effective in training sparse CNN

Abstract

Access this article

Similar content being viewed by others

Deep learning-based PET image denoising and reconstruction: a review

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

Super-resolution techniques for biomedical applications and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Make \(\ell _1\) regularization effective in training sparse CNN

Abstract

Access this article

Similar content being viewed by others

Deep learning-based PET image denoising and reconstruction: a review

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

Super-resolution techniques for biomedical applications and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation