Skip to main content
Log in

Convergence of Stochastic Gradient Descent in Deep Neural Network

  • Published:
Acta Mathematicae Applicatae Sinica, English Series Aims and scope Submit manuscript

Abstract

Stochastic gradient descent (SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning. This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed. Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems. However, the deep neural network has its unique properties. Some assumptions are inappropriate in the actual optimization process of this kind of model. In this paper, we modify the assumptions to make them more consistent with the actual optimization process of deep neural network. Based on new assumptions, we studied the convergence and convergence rate of SGD and its two common variant algorithms. In addition, we carried out numerical experiments with LeNet-5, a common network framework, on the data set MNIST to verify the rationality of our assumptions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Allen-Zhum Z., Hazanm E. Optimal black-box reductions between optimization objectives. In: Proceedings of the Advances in Neural Information Processing Systems, 2016, 1614–1622

  2. Allen-Zhu Z., Li Y. Neon2: Finding local minima via first-order oracles. In: Proceedings of the Advances in Neural Information Processing Systems, 2018, 3716–3726

  3. Bottou, L., Bousquet, O. The tradeoffs of large scale learning. In: Proceedings of the Advances in neural information processing systems, 2008, 161–168

  4. Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of the International Conference on Computational Statistics, 2010, 177–186

  5. Bottou L., Curtis F. E., Nocedal J. Optimization methods for large-scale machine learning. Siam Review, 60(2): 223–311 (2018)

    Article  MathSciNet  Google Scholar 

  6. Deng, L. The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Processing Magazine, 29(6): 141–142 (2012)

    Article  Google Scholar 

  7. Greff, K., Srivastava, R.K., Koutm’k, J., et al. LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10): 2222–2232 (2016)

    Article  MathSciNet  Google Scholar 

  8. He, K., Gkioxari, G., Dollar, P., et al. Mask renn. In: Proceedings of the IEEE International Conference on Computer Vision, 2017, 2980–2988

  9. Lecun, Y., Bottou, L., Bengio, Y., et al. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11): 2278–2324 (1998)

    Article  Google Scholar 

  10. Lecun, Y., Bengio, Y., Hinton, G. Deep learning. Nature, 521(7553): 436–444 (2015)

    Article  Google Scholar 

  11. Nesterov, Y.E. A method for solving the convex programming problem with convergence rate O(1/k2). Doklady Akademii Nauk SSSR, 269: 543–547 (1983)

    MathSciNet  Google Scholar 

  12. Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5): 1–17 (1964)

    Article  Google Scholar 

  13. Robbins, H., Monro, S. A stochastic approximation method. Herbert Robbins Selected Papers, 1985, 102–109

  14. Ge, R., Huang, F., Jin, C, et al. Escaping from saddle pointsonline stochastic gradient for tensor decomposition. In: Proceedings of the Conference on Learning Theory, 2015, 797–842

  15. Sainath, T.N., Kingsbury, B., Saon, G., et al. Deep Convolutional Neural Networks for large-scale speech tasks. Neurai Networks, 64: 39–48 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank to the anonymous reviewers and the editors for their constructive suggestions, which makes the paper more perfect.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cong-ying Han.

Additional information

This paper is supported by the National Natural Science Foundation of China (Nos.11731013, U19B2040, 11991022) and by the Leading Project of the Chinese Academy of Sciences (Nos. XDA27010102, XDA27010302).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Bc., Han, Cy. & Guo, Td. Convergence of Stochastic Gradient Descent in Deep Neural Network. Acta Math. Appl. Sin. Engl. Ser. 37, 126–136 (2021). https://doi.org/10.1007/s10255-021-0991-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10255-021-0991-2

Keywords

2000 MR Subject Classification

Navigation