Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments,Applied Mathematics and Optimization

当前位置： X-MOL 学术 › Appl. Math. Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments
Applied Mathematics and Optimization ( IF 1.8 ) Pub Date : 2020-09-06 , DOI: 10.1007/s00245-020-09718-8
Tuyen Trung Truong , Hang-Tuan Nguyen

In this paper, we provide new results and algorithms (including backtracking versions of Nesterov accelerated gradient and Momentum) which are more applicable to large scale optimisation as in Deep Neural Networks. We also demonstrate that Backtracking Gradient Descent (Backtracking GD) can obtain good upper bound estimates for local Lipschitz constants for the gradient, and that the convergence rate of Backtracking GD is similar to that in classical work of Armijo. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify a heuristic argument that Backtracking GD stabilises to a finite union of sequences constructed from Standard GD for the mini-batch practice, and show that our new algorithms (while automatically fine tuning learning rates) perform better than current state-of-the-art methods such as Adam, Adagrad, Adadelta, RMSProp, Momentum and Nesterov accelerated gradient. To help readers avoiding the confusion between heuristics and more rigorously justified algorithms, we also provide a review of the current state of convergence results for gradient descent methods. Accompanying source codes are available on GitHub.

中文翻译：

回溯梯度下降法及其在大规模优化中的一些应用。第2部分：算法和实验

在本文中，我们提供了新的结果和算法（包括Nesterov加速梯度和Momentum的回溯版本），它们更适用于深度神经网络中的大规模优化。我们还证明了Backtracking Gradient Descent（Backtracking GD）可以为梯度的本地Lipschitz常数获得良好的上限估计，并且Backtracking GD的收敛速度与Armijo的经典著作中的相似。在各种流行的体系结构上对数据集CIFAR10和CIFAR100进行的实验验证了一种启发式论点，即回溯GD可以稳定到由标准GD构造的用于小批量生产的有限序列并集，并证明我们的新算法（同时自动微调学习率）可以执行比目前的最先进方法（例如Adam，Adagrad，Adadelta，RMSProp，Momentum和Nesterov加速梯度。为了帮助读者避免启发式算法和更严格的合理算法之间的混淆，我们还提供了梯度下降方法的收敛结果的当前状态的概述。GitHub上提供了随附的源代码。

更新日期：2020-09-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>