Convergence of Stochastic Gradient Descent in Deep Neural Network,Acta Mathematicae Applicatae Sinica, English Series

当前位置： X-MOL 学术 › Acta Math. Appl. Sin. Engl. Ser. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Convergence of Stochastic Gradient Descent in Deep Neural Network
Acta Mathematicae Applicatae Sinica, English Series ( IF 0.9 ) Pub Date : 2021-01-01 , DOI: 10.1007/s10255-021-0991-2
Bai-cun Zhou , Cong-ying Han , Tian-de Guo

Stochastic gradient descent (SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning. This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed. Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems. However, the deep neural network has its unique properties. Some assumptions are inappropriate in the actual optimization process of this kind of model. In this paper, we modify the assumptions to make them more consistent with the actual optimization process of deep neural network. Based on new assumptions, we studied the convergence and convergence rate of SGD and its two common variant algorithms. In addition, we carried out numerical experiments with LeNet-5, a common network framework, on the data set MNIST to verify the rationality of our assumptions.

中文翻译：

深度神经网络中随机梯度下降的收敛性

随机梯度下降 (SGD) 是模式识别和机器学习中最常用的优化算法之一。该算法及其变体具有存储空间要求低、计算速度快等优点，是深度神经网络参数优化的首选算法。以前对这些算法收敛性的研究是基于优化问题中的一些传统假设。然而，深度神经网络有其独特的性质。在这类模型的实际优化过程中，有些假设是不恰当的。在本文中，我们修改了假设，使其更符合深度神经网络的实际优化过程。基于新的假设，我们研究了SGD及其两种常见变体算法的收敛性和收敛速度。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文