当前位置: X-MOL 学术Math. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Understanding the acceleration phenomenon via high-resolution differential equations
Mathematical Programming ( IF 2.2 ) Pub Date : 2021-07-06 , DOI: 10.1007/s10107-021-01681-8
Bin Shi 1 , Simon S. Du 2 , Michael I. Jordan 3 , Weijie J. Su 4
Affiliation  

Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms—Nesterov’s accelerated gradient method for strongly convex functions (NAG-SC) and Polyak’s heavy-ball method—we study an alternative limiting process that yields high-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak’s heavy-ball method, but they allow the identification of a term that we refer to as “gradient correction” that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov’s accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result—that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.



中文翻译:

通过高分辨率微分方程理解加速现象

可以从极限常微分方程(ODE)的角度研究基于梯度的优化算法。由于现有 ODE 无法区分两种根本不同的算法——Nesterov 的强凸函数加速梯度法 (NAG- SC ) 和 Polyak 的重球法——我们研究了一种替代的限制过程,该过程产生高分辨率 ODE。我们表明,这些 ODE 允许使用通用的 Lyapunov 函数框架来分析连续和离散时间的收敛性。我们还表明,这些 ODE 是底层算法更准确的替代品;特别是,他们不仅区分了 NAG- SC和 Polyak 的重球方法,但它们允许识别我们称为“梯度校正”的术语,该术语存在于 NAG- SC但不存在于重球方法中,并且是导致收敛性的质量差异的原因两种方法。我们还使用高分辨率 ODE 框架来研究 Nesterov 对(非强)凸函数的加速梯度方法,揭示了一个迄今为止未知的结果——NAG- C以反三次速率最小化平方梯度范数。最后,通过修改 NAG- C的高分辨率 ODE ,我们获得了一系列新的优化方法,这些方法被证明可以保持 NAG- C对平滑凸函数的加速收敛速度。

更新日期:2021-07-06
down
wechat
bug