当前位置: X-MOL 学术Math. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An $$O(s^r)$$ O ( s r ) -resolution ODE framework for understanding discrete-time algorithms and applications to the linear convergence of minimax problems
Mathematical Programming ( IF 2.7 ) Pub Date : 2021-06-29 , DOI: 10.1007/s10107-021-01669-4
Haihao Lu

There has been a long history of using ordinary differential equations (ODEs) to understand the dynamics of discrete-time algorithms (DTAs). Surprisingly, there are still two fundamental and unanswered questions: (i) it is unclear how to obtain a suitable ODE from a given DTA, and (ii) it is unclear the connection between the convergence of a DTA and its corresponding ODEs. In this paper, we propose a new machinery—an \(O(s^r)\)-resolution ODE framework—for analyzing the behavior of a generic DTA, which (partially) answers the above two questions. The framework contains three steps: 1. To obtain a suitable ODE from a given DTA, we define a hierarchy of \(O(s^r)\)-resolution ODEs of a DTA parameterized by the degree r, where s is the step-size of the DTA. We present a principal approach to construct the unique \(O(s^r)\)-resolution ODEs from a DTA; 2. To analyze the resulting ODE, we propose the \(O(s^r)\)-linear-convergence condition of a DTA with respect to an energy function, under which the \(O(s^r)\)-resolution ODE converges linearly to an optimal solution; 3. To bridge the convergence properties of a DTA and its corresponding ODEs, we define the properness of an energy function and show that the linear convergence of the \(O(s^r)\)-resolution ODE with respect to a proper energy function can automatically guarantee the linear convergence of the DTA. To better illustrate this machinery, we utilize it to study three classic algorithms—gradient descent ascent (GDA), proximal point method (PPM) and extra-gradient method (EGM)—for solving the unconstrained minimax problem \(\min _{x\in \mathbb {R}^n} \max _{y\in \mathbb {R}^m} L(x,y)\). Their O(s)-resolution ODEs explain the puzzling convergent/divergent behaviors of GDA, PPM and EGM when L(xy) is a bilinear function, and showcase that the interaction terms help the convergence of PPM/EGM but hurts the convergence of GDA. Furthermore, their O(s)-linear-convergence conditions not only unify the known scenarios when PPM and EGM have linear convergence, but also showcase that these two algorithms exhibit linear convergence in much broader contexts, including when solving a class of nonconvex-nonconcave minimax problems. Finally, we show how this ODE framework can help design new optimization algorithms for minimax problems, by studying the difference between the O(s)-resolution ODE of GDA and that of PPM/EGM.



中文翻译:

一个 $$O(s^r)$$O (sr) 分辨率 ODE 框架,用于理解离散时间算法和在极小极大问题的线性收敛中的应用

使用常微分方程 (ODE) 来理解离散时间算法 (DTA) 的动力学已有很长的历史。令人惊讶的是,仍然存在两个基本且未解决的问题:(i)不清楚如何从给定的 DTA获得合适的ODE,以及(ii)不清楚 DTA 的收敛与其对应的 ODE 之间的联系。在本文中,我们提出了一种新机制—— \(O(s^r)\)分辨率 ODE 框架——用于分析通用 DTA 的行为,它(部分)回答了上述两个问题。该框架包含三个步骤: 1. 为了从给定的 DTA 获得合适的 ODE,我们定义了一个由r度参数化的 DTA的\(O(s^r)\)分辨率 ODE 的层次结构,其中s是 DTA 的步长。我们提出了一种从 DTA构造独特的\(O(s^r)\)分辨率 ODE的主要方法;2.为了分析所得ODE,我们提出\(O(S ^ r)的\) -线性收敛一个DTA相对于能量函数,根据该条件\(O(S ^ r)的\) -分辨率 ODE 线性收敛到最优解;3. 为了桥接 DTA 及其相应 ODE 的收敛特性,我们定义了能量函数的适当性,并表明\(O(s^r)\)的线性收敛-关于适当能量函数的分辨率 ODE 可以自动保证 DTA 的线性收敛。为了更好地说明这种机制,我们利用它来研究三种经典算法——梯度下降上升 (GDA)、近端点法 (PPM) 和超梯度法 (EGM)——用于解决无约束极大极小问题\(\min _{x \in \mathbb {R}^n} \max _{y\in \mathbb {R}^m} L(x,y)\)。当L ( xy ) 是双线性函数时,他们的O ( s ) 分辨率 ODE 解释了 GDA、PPM 和 EGM 令人费解的收敛/发散行为,并展示了交互项有助于 PPM/EGM 收敛但损害收敛GDA 的。此外,他们的O( s )-线性收敛条件不仅统一了 PPM 和 EGM 具有线性收敛的已知场景,而且还展示了这两种算法在更广泛的上下文中表现出线性收敛,包括在解决一类非凸非凹极小极大问题时。最后,我们通过研究GDA的O ( s ) 分辨率 ODE 与 PPM/EGM 的 ODE之间的差异,展示了该 ODE 框架如何帮助设计极小极大问题的新优化算法。

更新日期:2021-06-29
down
wechat
bug