Weak error analysis for stochastic gradient descent optimization algorithms,arXiv - CS - Numerical Analysis

当前位置： X-MOL 学术 › arXiv.cs.NA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Weak error analysis for stochastic gradient descent optimization algorithms
arXiv - CS - Numerical Analysis Pub Date : 2020-07-03 , DOI: arxiv-2007.02723
Aritz Bercher, Lukas Gonon, Arnulf Jentzen, Diyora Salimova

Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and numerical approximations of partial differential equations. In mathematical convergence results for SGD type optimization schemes there are usually two types of error criteria studied in the scientific literature, that is, the error in the strong sense and the error with respect to the objective function. In applications one is often not only interested in the size of the error with respect to the objective function but also in the size of the error with respect to a test function which is possibly different from the objective function. The analysis of the size of this error is the subject of this article. In particular, the main result of this article proves under suitable assumptions that the size of this error decays at the same speed as in the special case where the test function coincides with the objective function.

中文翻译：

随机梯度下降优化算法的弱误差分析

随机梯度下降 (SGD) 类型的优化方案是大量基于机器学习的算法的基本要素。特别是，SGD 类型的优化方案经常用于涉及自然语言处理、对象和人脸识别、欺诈检测、计算广告和偏微分方程数值近似的应用中。在 SGD 类型优化方案的数学收敛结果中，科学文献中研究的误差准则通常有两种，即强意义误差和相对于目标函数的误差。在应用中，人们通常不仅对目标函数的误差大小感兴趣，而且对可能与目标函数不同的测试函数的误差大小感兴趣。分析这个误差的大小是本文的主题。特别是，本文的主要结果在适当的假设下证明了该误差的大小以与测试函数与目标函数重合的特殊情况相同的速度衰减。

更新日期：2020-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文