Strong error analysis for stochastic gradient descent optimization algorithms,IMA Journal of Numerical Analysis

当前位置： X-MOL 学术 › IMA J. Numer. Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Strong error analysis for stochastic gradient descent optimization algorithms
IMA Journal of Numerical Analysis ( IF 2.3 ) Pub Date : 2020-05-20 , DOI: 10.1093/imanum/drz055
Arnulf Jentzen ₁ , Benno Kuckuck ₁ , Ariel Neufeld ₂ , Philippe von Wurstemberger ₃

Affiliation

Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small |$\varepsilon \in (0,\infty )$| and every arbitrarily large |$p{\,\in\,} (0,\infty )$| that the considered SGD optimization algorithm converges in the strong |$L^p$|-sense with order |$1/2-\varepsilon $| to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large |$ p \in (0,\infty ) $| strong |$ L^p $|-convergence rates.

中文翻译：

随机梯度下降优化算法的强误差分析

随机梯度下降（SGD）优化算法是一系列机器学习应用程序的关键要素。在本文中，我们对SGD优化算法执行了严格的强错误分析。特别是，我们证明了| $ \ varepsilon \ in（0，\ infty）$ |中的每个任意小| 以及每个任意大的| $ p {\，\ in \，}（0，\ infty）$ | 所考虑的SGD优化算法收敛于强| $ L ^ p $ | -sense with order | $ 1 / 2- \ varepsilon $ |在目标函数的标准凸型假设和所采用的SGD优化算法中出现的随机误差的矩的宽松假设下，将考虑的随机优化问题的目标函数的全局最小值降至最小。我们的收敛证明的关键思想是，首先，利用动态系统的Lyapunov型函数的理论技术，开发基于SGD函数的SGD优化算法的通用收敛机制，然后将该通用机制应用于具体的Lyapunov类型的函数具有多项式结构，然后沿着Lyapunov类型的函数中的幂执行归纳论证，以实现每个任意大的| $ p \ in（0，\ infty）$ | 强大| $ L ^ p $ | -收敛速度。

更新日期：2020-05-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11