当前位置: X-MOL 学术arXiv.cs.CC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Complexity of Constrained Min-Max Optimization
arXiv - CS - Computational Complexity Pub Date : 2020-09-21 , DOI: arxiv-2009.09623
Constantinos Daskalakis and Stratis Skoulakis and Manolis Zampetakis

Despite its important applications in Machine Learning, min-max optimization of nonconvex-nonconcave objectives remains elusive. Not only are there no known first-order methods converging even to approximate local min-max points, but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity of the problem, as well as of the limitations of first-order methods in constrained min-max optimization problems with nonconvex-nonconcave objectives and linear constraints. As a warm-up, we show that, even when the objective is a Lipschitz and smooth differentiable function, deciding whether a min-max point exists, in fact even deciding whether an approximate min-max point exists, is NP-hard. More importantly, we show that an approximate local min-max point of large enough approximation is guaranteed to exist, but finding one such point is PPAD-complete. The same is true of computing an approximate fixed point of Gradient Descent/Ascent. An important byproduct of our proof is to establish an unconditional hardness result in the Nemirovsky-Yudin model. We show that, given oracle access to some function $f : P \to [-1, 1]$ and its gradient $\nabla f$, where $P \subseteq [0, 1]^d$ is a known convex polytope, every algorithm that finds a $\varepsilon$-approximate local min-max point needs to make a number of queries that is exponential in at least one of $1/\varepsilon$, $L$, $G$, or $d$, where $L$ and $G$ are respectively the smoothness and Lipschitzness of $f$ and $d$ is the dimension. This comes in sharp contrast to minimization problems, where finding approximate local minima in the same setting can be done with Projected Gradient Descent using $O(L/\varepsilon)$ many queries. Our result is the first to show an exponential separation between these two fundamental optimization problems.

中文翻译:

受限最小-最大优化的复杂性

尽管它在机器学习中具有重要应用,但非凸非凹目标的最小-最大优化仍然难以捉摸。不仅没有已知的一阶方法收敛甚至近似局部最小-最大点,而且识别它们的计算复杂性也知之甚少。在本文中,我们描述了问题的计算复杂度,以及一阶方法在具有非凸非凹目标和线性约束的约束最小-最大优化问题中的局限性。作为热身,我们表明,即使目标是 Lipschitz 和平滑可微函数,决定是否存在最小-最大点,实际上甚至决定是否存在近似的最小-最大点,也是 NP-hard。更重要的是,我们表明,保证存在一个足够大的近似局部最小-最大点,但找到一个这样的点是 PPAD 完全的。计算梯度下降/上升的近似固定点也是如此。我们证明的一个重要副产品是在 Nemirovsky-Yudin 模型中建立无条件硬度结果。我们证明,给定 oracle 访问某个函数 $f : P \to [-1, 1]$ 及其梯度 $\nabla f$,其中 $P \subseteq [0, 1]^d$ 是一个已知的凸多面体,找到 $\varepsilon$-approximate local min-max 点的每个算法都需要在 $1/\varepsilon$、$L$、$G$ 或 $d$ 中的至少一个中进行多次指数查询,其中$L$和$G$分别是$f$和$d$的平滑度和Lipschitzness是维度。这与最小化问题形成鲜明对比,可以使用 $O(L/\varepsilon)$ 许多查询通过投影梯度下降在相同的设置中找到近似的局部最小值。我们的结果首次显示了这两个基本优化问题之间的指数分离。
更新日期:2020-09-22
down
wechat
bug