On finite termination of an inexact Proximal Point algorithm

https://doi.org/10.1016/j.aml.2022.108348Get rights and content

Abstract

The presence of sharp minima in nondifferentiable optimization models has been exploited, in the last decades, in the benefit of various subgradient or proximal methods. One of the long-lasting general proximal schemes of choice used to minimize nonsmooth functions is the Proximal Point Algorithm (PPA). Regarding the basic PPA, several well-known works proved finite convergence towards weak sharp minima, when supposedly each iteration is computed exactly. However, in this letter we show finite convergence of a common Inexact version of PPA (IPPA), under sufficiently low but persistent perturbations of the proximal operator. Moreover, when a simple Subgradient Method is recurrently called as an inner routine for computing each IPPA iterate, a suboptimal minimizer of the original problem lying at ϵ distance from the optimal set is obtained after a total Olog(1/ϵ) subgradient evaluations. Our preliminary numerical tests show improvements over existing restartation versions of Subgradient Method.

Introduction

Several decades ago the Proximal Point Algorithm (PPA) started to gain much attention from both abstract operator theory and numerical optimization communities. Even in modern applications, where large-scale nonsmooth optimization recurrently arises, practitioners still take inspiration from proximal minimization theory to design scalable algorithmic techniques that overcome nonsmoothness. The powerful PPA iteration consists mainly in the recursive evaluation of the proximal operator associated to the objective function. The proximal operator is based on the infimal convolution with a smooth metric function, commonly chosen to be the squared Euclidean norm [1]. The Proximal Point recursion became famous in optimization community when [1], [2] revealed its connection to various multipliers methods for constrained minimization. There are remarkable works that shown the growth regularity and sharpness to be a key factor in the iteration complexity of PPA [2], [3], [4].

Let F:Rn(,] be a closed convex function, F=minxF(x) and X its optimal set of minimizers. The optimal set X is weakly sharp (WSM) if there exists σF>0 such that: WSM:F(x)FσFdistX(x)xdomF,where the distance to optimal set distX() is defined below. The finite convergence of the exact PPA under WSM property was proved by [5], [6], [7]. Furthermore, in [2] can be found an extensive convergence analysis of the exact PPA and the Augmented Lagrangian algorithm under general Hölderian growth. Although the results and analysis are of a remarkable generality, they have an asymptotic nature (see [8]). A preliminary nonasymptotic complexity is given in [8], where the equivalence between a Dual Augmented Lagrangian algorithm and a variable stepsize PPA is established. The authors analyze sparse learning models of the form: minxRnf(Ax)+ψ(x), where f is twice differentiable with Lipschitz continuous gradient, A is a linear operator and ψ is a convex nonsmooth regularizer. Under particular parameterized Hölderian growth, they show nonasymptotic superlinear convergence rate of the exact PPA with exponentially increasing stepsize. For the inexact variant they further show a slightly weaker superlinear convergence. The simple arguments used to improve [1] towards nonasymptotic estimates is remarkable. However, a convergence rate of inexact PPA (IPPA) could become irrelevant without quantifying the otherwise considerable local computational effort spent to compute each iteration.

The generous amount of previous works such as [2], [3], [4], [9], that focused on different variants of IPPA, gave less attention to the class of sharp objective functions, which makes our result up to our knowledge a first attempt to claim finite convergence for this case. As a second contribution, we evaluate the computational complexity when each inner subproblem is solved using a pure Projected Subgradient Method and provide Olog(1/ϵ) bounds on the total number of projected subgradient iterations necessary to reach ϵ distance to the optimal set.

Preliminaries and notations. For x,yRn denote the scalar product x,y=xTy and Euclidean norm by x=xTx. The projection operator onto a closed set XRn is denoted by πX and the distance from x to the set X is denoted distX(x)=minzXxz. We use h(x) for the subdifferential set and h(x) for a subgradient of h at x. In the differentiable case, h is the gradient of h. The problem of interest in this paper is: F=minxRnF(x),where F:Rn(,] is a closed convex function. By X we denote the optimal set associated to (1). We denote the Moreau envelope of F with Fμ and its proximal operator with proxμF(x), defined as: Fμ(x)minzF(z)+12μzx2 and proxμF(x)argminzF(z)+12μzx2, respectively, see [1], [2]. The envelope Fμ is a smooth approximation of F having Lipschitz gradient with constant 1μ [1], [2]. However, as reflected by the following lemma, Fμ inherits locally some similar growth properties as F.

Lemma 1

Let F be a closed convex function and WSM hold. Then the Moreau envelope Fμ satisfies the relation: Fμ(x)FHσF2μ(σFdistX(x))xdomF,where Hτ(s)=sτ2,s>τ12τs2,sτ is the Huber function.

Proof

The proof can be found in [10]. 

Outside a certain tube around the optimal set N(σFμ)={xRn:distX(x)σFμ}}, the Moreau envelope Fμ grows sharply. Inside of N(σFμ) it grows quadratically which, unlike the objective function F, allows the gradient to get small near to the optimal set. This separation of growth regimes suggests that first-order algorithms that minimize Fμ would reach very fast the region N(σFμ), allowing large steps in the first phase and subsequently, slowing down in the vicinity of the optimal set.

Section snippets

Inexact Proximal Point algorithm

The basic exact PPA iteration is shortly described as xt+1=proxμF(xt). However, since proxμF() is not always explicit or easily computable, then it is more realistic to rely on an approximation with respect to a certain criterion. Given xdomF, we consider z̃ a δ-approximation of proxμF(x) if it satisfies z̃proxμF(x)δ. Previous works as [1], [3], [4], [11] analyze similar approximation measures for inexact first order methods.

Several remarkable Refs. [12], [13] analyzed the perturbed and

Projected subgradient routine

Let F(x)f(x)+ιQ(x), where f:RnR is closed convex with bounded subgradients and ιQ is the indicator of a closed convex function Q. Although the influence of growth modulus σF on the behavior of IPPA is obvious, all complexity estimates derived earlier assume the existence of an oracle computing an approximate proximal mapping: xt+1argminzQf(z)+12μzxt2.In most situations this computational burden is considerable and a fast routine is needed to compute {xt}t0. For instance, in [8] a

Numerical simulations

Graph Support Vector Machine (SVM) extends the usual SVM through adding a graph-guided lasso regularization to the standard SVM hinge-loss objective. The regularization forces the underlying graph dependencies expressed by the subspace ranging the columns of a weighted graph adjacency matrix B: minxRn1mi=1mmax{0,1yiaiTx}+τBx1where aiRn, yi{±1}. When B=In we recover the Sparse 1-SVM formulation. In particular, the Lipschitz constant of the objective function is given by LF=1mi=1mai2+τ

References (15)

  • RockafellarR.T.

    Monotone operators and the proximal point algorithm

    SIAM J. Control Optim.

    (1976)
  • BertsekasD.P.

    Parallel and Distributed Computation: Numerical Methods

    (1989)
  • SolodovM.V. et al.

    A unified framework for some inexact proximal point algorithms

    Numer. Funct. Anal. Optim.

    (2001)
  • GülerO.

    New proximal point algorithms for convex minimization

    SIAM J. Optim.

    (1992)
  • BurkeJ.V. et al.

    Weak sharp minima in mathematical programming

    SIAM J. Control Optim.

    (1993)
  • FerrisM.C.

    Finite termination of the proximal point algorithm

    Math. Program.

    (1991)
  • AntipinA.

    On finite convergence of processes to a sharp minimum and to a smooth minimum with a sharp derivative

    Differential Equations

    (1994)
There are more references available in the full text version of this article.

The authors of this work were supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS/CCCDI - UEFISCDI, project number PN-III-P2-2.1-SOL-2021-0036, within PNCDI III. Andrei Pătraşcu was also supported by a grant of the Romanian Ministry of Education and Research, CNCS - UEFISCDI, project number PN-III-P1-1.1-PD-2019-1123, within PNCDI III. Paul Irofti was also supported by a grant of the Romanian Ministry of Education and Research , CNCS - UEFISCDI, project number PN-III-P1-1.1-PD-2019-0825, within PNCDI III.

View full text