An active set Newton-CG method for 1 optimization

https://doi.org/10.1016/j.acha.2019.08.005Get rights and content

Abstract

In this paper, we investigate the active set identification technique of ISTA and provide some good properties. An active set Newton-CG method is then proposed for 1 optimization. Under appropriate conditions, we show that the proposed method is globally convergent with some nonmonotone line search. The numerical comparisons with several state-of-art methods demonstrate the efficiency of the proposed method.

Introduction

Sparse recovery has received a lot of attention in the last decades. This is due to the fact that many applications such as signal or image processing and data mining/classification can be formulated as sparse recovery problems. Sparse recovery refers to recovering a sparse vector from a set of linear measurements, which is one of the most fundamental issues in compressed sensing (CS) [14]. Mathematically, it can be expressed asminx0s.t.Ax=b, where x0 counts the number of nonzero entries of x, ARm×n (usually mn) is called a sensing matrix, and bRm is a measurement vector. Problem (1.1) is NP-hard and difficult to solve in practice. To find the solution of (1.1), alternative models have been used, which include the basis pursuit (BP) problemminx1s.t.Ax=b, the closely related 1-regularized least squares problemminxRn12Axb2+μx1, the lasso problemminAxb2s.t.x1t, and the BP denoising problemsminx1s.t.Axbσ, where x1=i=1n|xi| and . denotes the Euclidean norm of vectors. The theory for penalty functions shows that the solution (1.3) approaches the solution of (1.2) as μ go to zero. The lasso problem (1.4) and the BP denoising problem (1.5) are equivalent to (1.3) for appropriate choices of the parameters t and σ. Thus it is very necessary and important to study problem (1.3). In this paper, we consider the more general 1-regularized optimization problemminϕ(x):=f(x)+μx1, where f is continuously differentiable and μ>0.

Recently, there have been many different approaches for solving (1.6). A large class of first order algorithms for solving problem (1.6) is based on the iterative shrinkage thresholding algorithms (ISTA)[13], [17] or variants of ISTA. To accelerate convergence, a two-step ISTA (TwiISA) algorithm was developed in [5] and the sequential subspace optimization techniques was added to ISTA [16]. Beck and Teboulle [4] constructed a faster iterative shrinkage-thresholding algorithm called FISTA that keeps its simplicity of ISTA but possesses a better global rate of convergence. Furthermore, to improve practical performance result of the above methods, Wright et al. [35] introduced the sparse reconstruction by separable approximation (SpaRSA) for solving (1.6). Hager et al. [23] analyzed the convergence rate of SpaRSA and proposed an improved version of SpaRSA based on a cyclic version of the BB iteration and an adaptive choice for the reference function value in the line search. Hale et al. [22] proposed a fixed point continuation algorithm (FPC_BB) that embeds ISTA [13], [17] in a continuation strategy. Wen et al. [33], [34] proposed the FPC active set (FPC_AS) algorithm that combines shrinkage, subspace optimization and continuation technique. Cheng and Dai [10] proposed gradient-based methods with active set strategy to solve (1.6). Liang et al. [25] proposed a general forward-backwark splitting method which identifies the active manifold in a finite number of iterations and has a local linear convergence.

Various types of algorithms are also designed to solve the equivalent constrained optimization reformulation of problem (1.6) or (1.3). For instance, interior point methods [28], projected gradient method [18] and alternating direction method of multipliers SALSA [1], [6]. Other algorithms for the 1 minimization include coordinate-wise descent methods [32], Bergman iterative regularization based methods [37], reduced-space algorithm [11], second-order methods [8], [9], [27], [36], [29], quasi-Newton methods [24], [26], gradient methods [30] for minimizing the more general function J(x)+H(x), where J is nonsmooth, H is smooth, and both are convex, a smoothed penalty algorithm (SPA) [2]. We refer to papers [7], [14], [19], [8], [31], [26], [11], [25] for more advances in this area.

As pointed out by the authors in [33], [34], the algorithm ISTA is very efficient in obtaining a support superset, but it is not efficient in recovering signal values. This motivated them to develop an efficient algorithm FPC_AS. The algorithm FPC_AS is divided into two stages that are performed repeatedly. Specifically, at the first stage “nonmonotone line search (NMLS)”, a first-order iterative “shrinkage” method to estimate the support of the solution. At the second stages, “subspace optimization”, a smaller smooth subproblem is solved to recover the magnitudes of x. Theoretically, the authors in [33], [34] showed that there exists an accumulation of x of {xk} generated by the algorithm FPC_AS, which is a stationary point of problem (1.6). Our approach is partially motivated by our belief that a second-order method should be faster than the first-order iterative shrinkage method. To accelerate the algorithm FPC_AS, we shall propose an active set Newton-CG to solve problem (1.6). We first investigate the active set identification technique of ISTA and provide some good properties. Based on the active set identification technique of ISTA, we propose the algorithm to solve (1.6). Specifically, the active variables and free variables are defined by the identification technique at each iteration. At each iteration, the same direction as that of the FPC_AS method [33], [34] at the first stage is used to update the active variables, while a second-order method is utilized for solving a smooth subproblem in order to update the free variables. Hence the method is distinct from the FPC_AS method [33], [34]. In addition, the nonmonotone line search [20] of the proposed method is different from that of the FPC_AS method. Under appropriate conditions, we show that every accumulation of x of {xk} generated by the proposed algorithm is a stationary point of problem (1.6). Numerical experiments with logistic regression problems and compressive sensing problems demonstrate that the proposed approach is competitive with several known methods.

The remainder of the paper is organized as follows. Some notations and properties related to (1.6) are given in Section 2. In Section 3, we propose the algorithm. In Section 4, we establish the global convergence of the algorithm. Some numerical results are reported in Section 5 and conclusions are made in the last section.

Section snippets

Notation and properties

Let x¯ be a stationary point of problem (1.6). We define the active set γ(x¯) to be the set of indices corresponding to the zero components of x¯ and the inactive set τ(x¯) to be the support of x¯, respectively; i.e.,γ(x¯)={i:x¯i=0}andτ(x¯)={i:x¯i0}. Furthermore, the active set and the support of x¯ can be subdivided into two sets, respectively.γ+(x¯)={iγ(x¯):|gi(x¯)|<μ},γ0(x¯)={iγ(x¯):|gi(x¯)|μ} andτ+(x¯)={i:x¯i>0},τ(x¯)={i:x¯i<0}, where gi(x) is the ith component of the gradient vector

Active set estimate of ISTA

In this section, we investigate the active set identification technique of ISTA and give some good properties of it. Consider the generic iteration of ISTA [13], [17]:xk+1=argminxRn{f(xk)+g(xk)T(xxk)+12ϵ1xxk2+μx1}, where ϵ1 is a given positive constant. From the optimality condition of the above problem, we getxik+1={0,ifϵ1(gi(xk)μ)xikϵ1(gi(xk)+μ);sgn(xikϵ1gi(xk))(|xikϵ1gi(xk)|ϵ1μ),otherwise. Then, we get that the indices of the zero variables at xk+1 belong to the following setAI

The new algorithm

In this section, based on the active set identification technique in Section 3, we develop a fast Newton-CG method for solving 1 optimization. We make the following assumptions on the objective function.

Assumption 4.1

  • (i)

    The level set Ω:={xRn:ϕ(x)ϕ(x0)} is bounded.

  • (ii)

    In some neighbourhood N of Ω, f is continuously differentiable and its gradient is Lipschitz continuous, i.e., there exists a constant L>0 such thatg(x)g(y)Lxy,x,yN.

Numerical experiments

In this section, we present some numerical experiments to test the performance of the proposed algorithm and compare it with the following five state-of-the-art 1-minimization algorithms.

  • FPC_AS [33]. FPC_AS is divided into two stages that are performed repeatedly. At the first stage, a first-order method based on “shrinkage” is performed to obtain a working index set. At the second stage, it utilizes a second-order method to solve a smooth subproblem defined by the working index set. The two

Conclusion

In this paper, we investigated the active set identification technique used by ISTA and gave some good properties of it. Based on the active set identification technique, we proposed a Newton-CG method. Under appropriate conditions, we showed that the method based on the nonmonotone line search techniques is globally convergent. The numerical results presented in Section 5 demonstrate the effectiveness of the algorithm for solving 1-regularized nonconvex problems and some standard 21

Acknowledgements

The authors thank the two anonymous referees very much for their valuable comments and suggestions, which helped us to improve the quality of this manuscript greatly.

References (39)

  • M. Elad et al.

    Subspace optimization methods for linear least squares with non-quadratic regularization

    Appl. Comput. Harmon. Anal.

    (2007)
  • M. Afonso et al.

    Fast image recovery using variable splitting and constrained optimization

    IEEE Trans. Image Process.

    (2010)
  • S. Aybat et al.

    A first-order smoothed penalty method for compressed sensing

    SIAM J. Optim.

    (2011)
  • J. Barzilai et al.

    Two point step size gradient methods

    IMA J. Numer. Anal.

    (1988)
  • A. Beck et al.

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems

    SIAM J. Imaging Sci.

    (2009)
  • J.M. Bioucas-Dias et al.

    A new twist: two-step iterative shrinkage/thresholding algorithms for image restoration

    IEEE Trans. Image Process.

    (2007)
  • D. Boley

    Local linear convergence of the alternating direction method of multipliers on quadratic or linear program

    SIAM J. Optim.

    (2013)
  • A.M. Bruckstein et al.

    From sparse solutions of systems of equations to sparse modeling of signals and images

    SIAM Rev.

    (2009)
  • R.H. Byrd et al.

    A family of second-order methods for convex 1-regularized optimization

    Math. Program., Ser. A

    (2016)
  • R.H. Byrd et al.

    An inexact successive quadratic approximation method for 1 regularized optimization

    Math. Program., Ser. B

    (2016)
  • W.Y. Cheng et al.

    Gradient-based method with active set strategy for 1 optimization

    Math. Comp.

    (2018)
  • T.Y. Chen et al.

    A reduced-space algorithm for minimizing 1-regularized convex functions

    SIAM J. Optim.

    (2017)
  • A.R. Conn et al.

    Trust-region methods

  • I. Daubechies et al.

    An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

    Comm. Pure Appl. Math.

    (2004)
  • D. Donoho

    Compressed sensing

    IEEE Trans. Inform. Theory

    (2006)
  • E.D. Dolan et al.

    Benchmarking optimization software with performance profiles

    Math. Program.

    (2002)
  • M.A.T. Figueiredo et al.

    An EM algorithm for wavelet-based image restoration

    IEEE Trans. Image Process.

    (2003)
  • M.A.T. Figueiredo et al.

    Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems

    IEEE J. Sel. Top. Signal Process.

    (2007)
  • K. Fountoulakis et al.

    A second-order method for strongly convex 1-regularization problems

    Math. Program., Ser. A

    (2016)
  • Cited by (1)

    Supported by the Chinese NSF Grant (nos. 11971106, 11371154, 11331012 and 81173633), the Key Project of Chinese National Programs for Fundamental Research and Development (no. 2015CB856002), the China National Funds for Distinguished Young Scientists (no. 11125107), by the Ministry of Education, Humanities and Social Sciences project (no. 17JYJAZH011) and by the Natural Science Foundation of Guangdong Province (2018A030313229).

    View full text