A fully stochastic second-order trust region method,Optimization Methods & Software

当前位置： X-MOL 学术 › Optim. Methods Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A fully stochastic second-order trust region method
Optimization Methods & Software ( IF 2.2 ) Pub Date : 2020-11-25 , DOI: 10.1080/10556788.2020.1852403
Frank E. Curtis ₁ , Rui Shi ₁

Affiliation

ABSTRACT

A stochastic second-order trust region method is proposed, which can be viewed as an extension of the trust-region-ish (TRish) algorithm proposed by Curtis et al. [A stochastic trust region algorithm based on careful step normalization. INFORMS J. Optim. 1(3) 200–220, 2019]. In each iteration, a search direction is computed by (approximately) solving a subproblem defined by stochastic gradient and Hessian estimates. The algorithm has convergence guarantees in the fully stochastic regime, i.e. when each stochastic gradient is merely an unbiased estimate of the gradient with bounded variance and the stochastic Hessian estimates are bounded. This framework covers a variety of implementations, such as when the stochastic Hessians are defined by sampled second-order derivatives or diagonal matrices, such as in RMSprop, Adagrad, Adam and other popular algorithms. The proposed algorithm has a worst-case complexity guarantee in the nearly deterministic regime, i.e. when the stochastic gradients and Hessians are close in expectation to the true gradients and Hessians. The results of numerical experiments for training CNNs for image classification and an RNN for time series forecasting are presented. These results show that the algorithm can outperform a stochastic gradient and first-order TRish algorithm.

中文翻译：

一种完全随机的二阶信任域方法

摘要

提出了一种随机二阶信任域方法，可以看作是Curtis 等人提出的trust-region-ish (TRish) 算法的扩展。[一种基于细步标准化的随机信任域算法。通知 J. Optim。1(3) 200–220, 2019]。在每次迭代中，通过（近似）求解由随机梯度和 Hessian 估计定义的子问题来计算搜索方向。该算法在完全随机状态下具有收敛保证，即当每个随机梯度仅仅是具有有界方差的梯度的无偏估计并且随机 Hessian 估计是有界的时。该框架涵盖了多种实现，例如当随机 Hessians 由采样的二阶导数或对角矩阵定义时，例如在 RMSprop、Adagrad、Adam 和其他流行算法中。所提出的算法在近乎确定的情况下具有最坏情况的复杂性保证，即当随机梯度和 Hessians 与真实梯度和 Hessians 的期望值接近时。给出了训练用于图像分类的 CNN 和用于时间序列预测的 RNN 的数值实验结果。这些结果表明，该算法可以优于随机梯度和一阶 TRish 算法。

更新日期：2020-11-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>