Stochastic quasi-gradient methods: variance reduction via Jacobian sketching,Mathematical Programming

当前位置： X-MOL 学术 › Math. Program. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
Mathematical Programming ( IF 2.2 ) Pub Date : 2020-05-12 , DOI: 10.1007/s10107-020-01506-0
Robert M Gower ₁ , Peter Richtárik _{2,

3,

4} , Francis Bach ₅

Affiliation

We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method --JacSketch-- is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasi-gradient method. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the stochastic average gradient (SAGA) method, and several of its existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al (2015). The rates we obtain for minibatch SAGA are also superior to existing rates.

中文翻译：

随机准梯度方法：通过雅可比草图减少方差

我们开发了一系列新的方差减少随机梯度下降方法，用于最小化大量平滑函数的平均值。我们的方法——JacSketch——受到随机数值线性代数的新颖发展的启发，并通过维护由各个函数的梯度组成的雅可比矩阵的随机估计来进行操作。在每次迭代中，JacSketch 首先通过（廉价）草图获得真实雅可比行列式的随机线性测量，然后将先前的估计投影到其解与测量一致的线性矩阵方程的解空间上，从而有效地更新雅可比矩阵。然后使用雅可比估计来计算梯度的方差减少的无偏估计量。我们的策略类似于拟牛顿方法维持 Hessian 矩阵估计的方式，因此我们的方法可以被视为随机拟梯度方法。我们证明，对于平滑和强凸函数，JacSketch 以有意义的速率线性收敛，该速率由适用于一般草图的单一收敛定理决定。我们还提供了适用于较小类别草图的精炼收敛定理。这使我们能够通过重要性采样获得 JacSketch 变体的更清晰的复杂性结果。通过将我们的通用方法专门用于特定的草图策略，JacSketch 简化为随机平均梯度 (SAGA) 方法，以及其现有的几个和许多新的小批量、减少的内存和重要性采样变体。我们对具有重要性采样的 SAGA 的速率是该方法当前最著名的速率，解决了 Schmidt 等人 (2015) 的猜想。我们获得的小批量 SAGA 速率也优于现有速率。

更新日期：2020-05-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11