Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points,Foundations of Computational Mathematics

当前位置： X-MOL 学术 › Found. Comput. Math. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points
Foundations of Computational Mathematics ( IF 2.5 ) Pub Date : 2021-03-19 , DOI: 10.1007/s10208-021-09499-8
Krishnakumar Balasubramanian , Saeed Ghadimi

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To facilitate zeroth-order optimization in high dimensions, we explore the advantages of structural sparsity assumptions. Specifically, (i) we highlight an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step size and (ii) propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. We next focus on avoiding saddle points in nonconvex setting. Toward that, we interpret the Gaussian smoothing technique for estimating gradient based on zeroth-order information as an instantiation of first-order Stein’s identity. Based on this, we provide a novel linear-(in dimension) time estimator of the Hessian matrix of a function using only zeroth-order information, which is based on second-order Stein’s identity. We then provide a zeroth-order variant of cubic regularized Newton method for avoiding saddle points and discuss its rate of convergence to local minima.

中文翻译：

零阶非凸随机优化：处理约束，高维和鞍点

在本文中，我们提出并分析了非凸和凸优化的零阶随机逼近算法，重点是解决约束优化，高维设置和避免鞍点问题。为了处理约束优化，我们首先提出条件梯度算法的一般化，其仅使用零阶信息来实现与标准随机梯度算法相似的速率。为了促进高维度的零阶优化，我们探索了结构稀疏性假设的优势。具体来说，（i）我们突出了一个隐式正则化现象，其中具有零阶信息的标准随机梯度算法仅需改变步长即可适应当前问题的稀疏性；（ii）提出一种具有零阶信息的截短随机梯度算法，其收敛速度仅在对数上取决于维度。接下来，我们将重点放在避免非凸设置中的鞍点上。为此，我们将基于零阶信息的高斯平滑技术（用于估计梯度）解释为一阶Stein身份的实例化。基于此，我们仅使用零阶信息（基于二阶Stein身份）提供了函数的Hessian矩阵的新颖线性（维数）时间估计器。

更新日期：2021-03-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11