On finite termination of an inexact Proximal Point algorithm☆
Introduction
Several decades ago the Proximal Point Algorithm (PPA) started to gain much attention from both abstract operator theory and numerical optimization communities. Even in modern applications, where large-scale nonsmooth optimization recurrently arises, practitioners still take inspiration from proximal minimization theory to design scalable algorithmic techniques that overcome nonsmoothness. The powerful PPA iteration consists mainly in the recursive evaluation of the proximal operator associated to the objective function. The proximal operator is based on the infimal convolution with a smooth metric function, commonly chosen to be the squared Euclidean norm [1]. The Proximal Point recursion became famous in optimization community when [1], [2] revealed its connection to various multipliers methods for constrained minimization. There are remarkable works that shown the growth regularity and sharpness to be a key factor in the iteration complexity of PPA [2], [3], [4].
Let be a closed convex function, and its optimal set of minimizers. The optimal set is weakly sharp (WSM) if there exists such that: where the distance to optimal set is defined below. The finite convergence of the exact PPA under WSM property was proved by [5], [6], [7]. Furthermore, in [2] can be found an extensive convergence analysis of the exact PPA and the Augmented Lagrangian algorithm under general Hölderian growth. Although the results and analysis are of a remarkable generality, they have an asymptotic nature (see [8]). A preliminary nonasymptotic complexity is given in [8], where the equivalence between a Dual Augmented Lagrangian algorithm and a variable stepsize PPA is established. The authors analyze sparse learning models of the form: , where is twice differentiable with Lipschitz continuous gradient, is a linear operator and is a convex nonsmooth regularizer. Under particular parameterized Hölderian growth, they show nonasymptotic superlinear convergence rate of the exact PPA with exponentially increasing stepsize. For the inexact variant they further show a slightly weaker superlinear convergence. The simple arguments used to improve [1] towards nonasymptotic estimates is remarkable. However, a convergence rate of inexact PPA (IPPA) could become irrelevant without quantifying the otherwise considerable local computational effort spent to compute each iteration.
The generous amount of previous works such as [2], [3], [4], [9], that focused on different variants of IPPA, gave less attention to the class of sharp objective functions, which makes our result up to our knowledge a first attempt to claim finite convergence for this case. As a second contribution, we evaluate the computational complexity when each inner subproblem is solved using a pure Projected Subgradient Method and provide bounds on the total number of projected subgradient iterations necessary to reach distance to the optimal set.
Preliminaries and notations. For denote the scalar product and Euclidean norm by . The projection operator onto a closed set is denoted by and the distance from to the set is denoted . We use for the subdifferential set and for a subgradient of at . In the differentiable case, is the gradient of . The problem of interest in this paper is: where is a closed convex function. By we denote the optimal set associated to (1). We denote the Moreau envelope of with and its proximal operator with , defined as: and , respectively, see [1], [2]. The envelope is a smooth approximation of having Lipschitz gradient with constant [1], [2]. However, as reflected by the following lemma, inherits locally some similar growth properties as .
Lemma 1 Let be a closed convex function and WSM hold. Then the Moreau envelope satisfies the relation: where is the Huber function.
Proof The proof can be found in [10]. □
Outside a certain tube around the optimal set }, the Moreau envelope grows sharply. Inside of it grows quadratically which, unlike the objective function , allows the gradient to get small near to the optimal set. This separation of growth regimes suggests that first-order algorithms that minimize would reach very fast the region , allowing large steps in the first phase and subsequently, slowing down in the vicinity of the optimal set.
Section snippets
Inexact Proximal Point algorithm
The basic exact PPA iteration is shortly described as . However, since is not always explicit or easily computable, then it is more realistic to rely on an approximation with respect to a certain criterion. Given , we consider a -approximation of if it satisfies . Previous works as [1], [3], [4], [11] analyze similar approximation measures for inexact first order methods.
Several remarkable Refs. [12], [13] analyzed the perturbed and
Projected subgradient routine
Let , where is closed convex with bounded subgradients and is the indicator of a closed convex function . Although the influence of growth modulus on the behavior of IPPA is obvious, all complexity estimates derived earlier assume the existence of an oracle computing an approximate proximal mapping: In most situations this computational burden is considerable and a fast routine is needed to compute . For instance, in [8] a
Numerical simulations
Graph Support Vector Machine (SVM) extends the usual SVM through adding a graph-guided lasso regularization to the standard SVM hinge-loss objective. The regularization forces the underlying graph dependencies expressed by the subspace ranging the columns of a weighted graph adjacency matrix : where , . When we recover the Sparse -SVM formulation. In particular, the Lipschitz constant of the objective function is given by
References (15)
Monotone operators and the proximal point algorithm
SIAM J. Control Optim.
(1976)Parallel and Distributed Computation: Numerical Methods
(1989)- et al.
A unified framework for some inexact proximal point algorithms
Numer. Funct. Anal. Optim.
(2001) New proximal point algorithms for convex minimization
SIAM J. Optim.
(1992)- et al.
Weak sharp minima in mathematical programming
SIAM J. Control Optim.
(1993) Finite termination of the proximal point algorithm
Math. Program.
(1991)On finite convergence of processes to a sharp minimum and to a smooth minimum with a sharp derivative
Differential Equations
(1994)
Cited by (1)
- ☆
The authors of this work were supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS/CCCDI - UEFISCDI, project number PN-III-P2-2.1-SOL-2021-0036, within PNCDI III. Andrei Pătraşcu was also supported by a grant of the Romanian Ministry of Education and Research, CNCS - UEFISCDI, project number PN-III-P1-1.1-PD-2019-1123, within PNCDI III. Paul Irofti was also supported by a grant of the Romanian Ministry of Education and Research , CNCS - UEFISCDI, project number PN-III-P1-1.1-PD-2019-0825, within PNCDI III.