Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates,Journal of Complexity

当前位置： X-MOL 学术 › J. Complex. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
Journal of Complexity ( IF 1.8 ) Pub Date : 2019-09-27 , DOI: 10.1016/j.jco.2019.101438
Arnulf Jentzen , Philippe von Wurstemberger

The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD. In this article we consider a simple quadratic stochastic optimization problem and establish for every $γ, ν \in (0, \infty)$ essentially matching lower and upper bounds for the mean square error of the associated SGD process with learning rates ${(\frac{γ}{n^{ν}})}_{n \in N}$ . This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the choice of the learning rates.

中文翻译：

随机梯度下降优化算法的较低误差范围：急剧收敛的速率，可实现缓慢和快速衰减的学习速率

随机梯度下降（SGD）优化算法是用于逼近机器学习，尤其是深度学习应用中出现的随机优化问题的解决方案的中央工具之一。因此，重要的是分析SGD的收敛行为。在本文中，我们考虑一个简单的二次随机优化问题，并为每个 $γ ， ν \in （ 0 ， \infty ）$ 实质上将相关SGD过程的均方误差的上下限与学习率匹配 ${（ \frac{γ}{ñ^{ν}} ）}_{ñ \in ñ}$ 。这使我们能够根据学习速率的选择精确地量化SGD方法的均方收敛速率。

更新日期：2019-09-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11