An active set Newton-CG method for ℓ1 optimization

doi:10.1016/j.acha.2019.08.005

Applied and Computational Harmonic Analysis

Volume 50, January 2021, Pages 303-325

https://doi.org/10.1016/j.acha.2019.08.005 Get rights and content

Abstract

In this paper, we investigate the active set identification technique of ISTA and provide some good properties. An active set Newton-CG method is then proposed for $ℓ_{1}$ optimization. Under appropriate conditions, we show that the proposed method is globally convergent with some nonmonotone line search. The numerical comparisons with several state-of-art methods demonstrate the efficiency of the proposed method.

Introduction

Sparse recovery has received a lot of attention in the last decades. This is due to the fact that many applications such as signal or image processing and data mining/classification can be formulated as sparse recovery problems. Sparse recovery refers to recovering a sparse vector from a set of linear measurements, which is one of the most fundamental issues in compressed sensing (CS) [14]. Mathematically, it can be expressed as $\min {‖ x ‖}_{0} s.t. A x = b,$ where ${‖ x ‖}_{0}$ counts the number of nonzero entries of x, $A \in R^{m \times n}$ (usually $m ≪ n$ ) is called a sensing matrix, and $b \in R^{m}$ is a measurement vector. Problem (1.1) is NP-hard and difficult to solve in practice. To find the solution of (1.1), alternative models have been used, which include the basis pursuit (BP) problem $\min {‖ x ‖}_{1} s.t. A x = b,$ the closely related $ℓ_{1}$ -regularized least squares problem $\min_{x \in R^{n}} \frac{1}{2} {‖ A x - b ‖}^{2} + μ {‖ x ‖}_{1},$ the lasso problem $\min {‖ A x - b ‖}^{2} s.t. {‖ x ‖}_{1} \leq t,$ and the BP denoising problems $\min {‖ x ‖}_{1} s.t. ‖ A x - b ‖ \leq σ,$ where ${‖ x ‖}_{1} = \sum_{i = 1}^{n} | x_{i} |$ and $‖ . ‖$ denotes the Euclidean norm of vectors. The theory for penalty functions shows that the solution (1.3) approaches the solution of (1.2) as μ go to zero. The lasso problem (1.4) and the BP denoising problem (1.5) are equivalent to (1.3) for appropriate choices of the parameters t and σ. Thus it is very necessary and important to study problem (1.3). In this paper, we consider the more general $ℓ_{1}$ -regularized optimization problem $\min ϕ (x) : = f (x) + μ {‖ x ‖}_{1},$ where f is continuously differentiable and $μ > 0$ .

Recently, there have been many different approaches for solving (1.6). A large class of first order algorithms for solving problem (1.6) is based on the iterative shrinkage thresholding algorithms (ISTA)[13], [17] or variants of ISTA. To accelerate convergence, a two-step ISTA (TwiISA) algorithm was developed in [5] and the sequential subspace optimization techniques was added to ISTA [16]. Beck and Teboulle [4] constructed a faster iterative shrinkage-thresholding algorithm called FISTA that keeps its simplicity of ISTA but possesses a better global rate of convergence. Furthermore, to improve practical performance result of the above methods, Wright et al. [35] introduced the sparse reconstruction by separable approximation (SpaRSA) for solving (1.6). Hager et al. [23] analyzed the convergence rate of SpaRSA and proposed an improved version of SpaRSA based on a cyclic version of the BB iteration and an adaptive choice for the reference function value in the line search. Hale et al. [22] proposed a fixed point continuation algorithm (FPC_BB) that embeds ISTA [13], [17] in a continuation strategy. Wen et al. [33], [34] proposed the FPC active set (FPC_AS) algorithm that combines shrinkage, subspace optimization and continuation technique. Cheng and Dai [10] proposed gradient-based methods with active set strategy to solve (1.6). Liang et al. [25] proposed a general forward-backwark splitting method which identifies the active manifold in a finite number of iterations and has a local linear convergence.

Various types of algorithms are also designed to solve the equivalent constrained optimization reformulation of problem (1.6) or (1.3). For instance, interior point methods [28], projected gradient method [18] and alternating direction method of multipliers SALSA [1], [6]. Other algorithms for the $ℓ_{1}$ minimization include coordinate-wise descent methods [32], Bergman iterative regularization based methods [37], reduced-space algorithm [11], second-order methods [8], [9], [27], [36], [29], quasi-Newton methods [24], [26], gradient methods [30] for minimizing the more general function $J (x) + H (x)$ , where J is nonsmooth, H is smooth, and both are convex, a smoothed penalty algorithm (SPA) [2]. We refer to papers [7], [14], [19], [8], [31], [26], [11], [25] for more advances in this area.

As pointed out by the authors in [33], [34], the algorithm ISTA is very efficient in obtaining a support superset, but it is not efficient in recovering signal values. This motivated them to develop an efficient algorithm FPC_AS. The algorithm FPC_AS is divided into two stages that are performed repeatedly. Specifically, at the first stage “nonmonotone line search (NMLS)”, a first-order iterative “shrinkage” method to estimate the support of the solution. At the second stages, “subspace optimization”, a smaller smooth subproblem is solved to recover the magnitudes of x. Theoretically, the authors in [33], [34] showed that there exists an accumulation of $x^{⁎}$ of ${x^{k}}$ generated by the algorithm FPC_AS, which is a stationary point of problem (1.6). Our approach is partially motivated by our belief that a second-order method should be faster than the first-order iterative shrinkage method. To accelerate the algorithm FPC_AS, we shall propose an active set Newton-CG to solve problem (1.6). We first investigate the active set identification technique of ISTA and provide some good properties. Based on the active set identification technique of ISTA, we propose the algorithm to solve (1.6). Specifically, the active variables and free variables are defined by the identification technique at each iteration. At each iteration, the same direction as that of the FPC_AS method [33], [34] at the first stage is used to update the active variables, while a second-order method is utilized for solving a smooth subproblem in order to update the free variables. Hence the method is distinct from the FPC_AS method [33], [34]. In addition, the nonmonotone line search [20] of the proposed method is different from that of the FPC_AS method. Under appropriate conditions, we show that every accumulation of $x^{⁎}$ of ${x^{k}}$ generated by the proposed algorithm is a stationary point of problem (1.6). Numerical experiments with logistic regression problems and compressive sensing problems demonstrate that the proposed approach is competitive with several known methods.

The remainder of the paper is organized as follows. Some notations and properties related to (1.6) are given in Section 2. In Section 3, we propose the algorithm. In Section 4, we establish the global convergence of the algorithm. Some numerical results are reported in Section 5 and conclusions are made in the last section.

Section snippets

Notation and properties

Let $\bar{x}$ be a stationary point of problem (1.6). We define the active set $γ (\bar{x})$ to be the set of indices corresponding to the zero components of $\bar{x}$ and the inactive set $τ (\bar{x})$ to be the support of $\bar{x}$ , respectively; i.e., $γ (\bar{x}) = {i : {\bar{x}}_{i} = 0} and τ (\bar{x}) = {i : {\bar{x}}_{i} \neq 0} .$ Furthermore, the active set and the support of $\bar{x}$ can be subdivided into two sets, respectively. $γ_{+} (\bar{x}) = {i \in γ (\bar{x}) : | g_{i} (\bar{x}) | < μ}, γ_{0} (\bar{x}) = {i \in γ (\bar{x}) : | g_{i} (\bar{x}) | \geq μ}$ and $τ_{+} (\bar{x}) = {i : {\bar{x}}_{i} > 0}, τ_{-} (\bar{x}) = {i : {\bar{x}}_{i} < 0},$ where $g_{i} (x)$ is the ith component of the gradient vector

Active set estimate of ISTA

In this section, we investigate the active set identification technique of ISTA and give some good properties of it. Consider the generic iteration of ISTA [13], [17]: $x^{k + 1} = \arg \min_{x \in R^{n}} {f (x^{k}) + g {(x^{k})}^{T} (x - x^{k}) + \frac{1}{2 ϵ_{1}} {‖ x - x^{k} ‖}^{2} + μ {‖ x ‖}_{1}},$ where $ϵ_{1}$ is a given positive constant. From the optimality condition of the above problem, we get $x_{i}^{k + 1} = {\begin{matrix} 0, & if ϵ_{1} (g_{i} (x^{k}) - μ) \leq x_{i}^{k} \leq ϵ_{1} (g_{i} (x^{k}) + μ); \\ sgn (x_{i}^{k} - ϵ_{1} g_{i} (x^{k})) (| x_{i}^{k} - ϵ_{1} g_{i} (x^{k}) | - ϵ_{1} μ), & otherwise . \end{matrix}$ Then, we get that the indices of the zero variables at $x^{k + 1}$ belong to the following set $A_{I}$

The new algorithm

In this section, based on the active set identification technique in Section 3, we develop a fast Newton-CG method for solving $ℓ_{1}$ optimization. We make the following assumptions on the objective function.

Assumption 4.1

(i)
The level set $Ω : = {x \in R^{n} : ϕ (x) \leq ϕ (x^{0})}$ is bounded.
(ii)
In some neighbourhood $N$ of Ω, f is continuously differentiable and its gradient is Lipschitz continuous, i.e., there exists a constant $L > 0$ such that $‖ g (x) - g (y) ‖ \leq L ‖ x - y ‖, \forall x, y \in N .$

Numerical experiments

In this section, we present some numerical experiments to test the performance of the proposed algorithm and compare it with the following five state-of-the-art $ℓ_{1}$ -minimization algorithms.

FPC_AS [33]. FPC_AS is divided into two stages that are performed repeatedly. At the first stage, a first-order method based on “shrinkage” is performed to obtain a working index set. At the second stage, it utilizes a second-order method to solve a smooth subproblem defined by the working index set. The two

Conclusion

In this paper, we investigated the active set identification technique used by ISTA and gave some good properties of it. Based on the active set identification technique, we proposed a Newton-CG method. Under appropriate conditions, we showed that the method based on the nonmonotone line search techniques is globally convergent. The numerical results presented in Section 5 demonstrate the effectiveness of the algorithm for solving $ℓ_{1}$ -regularized nonconvex problems and some standard $ℓ_{2} - ℓ_{1}$

Acknowledgements

The authors thank the two anonymous referees very much for their valuable comments and suggestions, which helped us to improve the quality of this manuscript greatly.

References (39)

M. Elad et al.
Subspace optimization methods for linear least squares with non-quadratic regularization
Appl. Comput. Harmon. Anal.
(2007)
M. Afonso et al.
Fast image recovery using variable splitting and constrained optimization
IEEE Trans. Image Process.
(2010)
S. Aybat et al.
A first-order smoothed penalty method for compressed sensing
SIAM J. Optim.
(2011)
J. Barzilai et al.
Two point step size gradient methods
IMA J. Numer. Anal.
(1988)
A. Beck et al.
A fast iterative shrinkage-thresholding algorithm for linear inverse problems
SIAM J. Imaging Sci.
(2009)
J.M. Bioucas-Dias et al.
A new twist: two-step iterative shrinkage/thresholding algorithms for image restoration
IEEE Trans. Image Process.
(2007)
D. Boley
Local linear convergence of the alternating direction method of multipliers on quadratic or linear program
SIAM J. Optim.
(2013)
A.M. Bruckstein et al.
From sparse solutions of systems of equations to sparse modeling of signals and images
SIAM Rev.
(2009)
R.H. Byrd et al.
A family of second-order methods for convex $ℓ_{1}$ -regularized optimization
Math. Program., Ser. A
(2016)
R.H. Byrd et al.
An inexact successive quadratic approximation method for $ℓ_{1}$ regularized optimization
Math. Program., Ser. B
(2016)

W.Y. Cheng et al.

Gradient-based method with active set strategy for $ℓ_{1}$ optimization

Math. Comp.

(2018)

T.Y. Chen et al.

A reduced-space algorithm for minimizing $ℓ_{1}$ -regularized convex functions

SIAM J. Optim.

(2017)

A.R. Conn et al.

Trust-region methods

I. Daubechies et al.

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

Comm. Pure Appl. Math.

(2004)

D. Donoho

Compressed sensing

IEEE Trans. Inform. Theory

(2006)

E.D. Dolan et al.

Benchmarking optimization software with performance profiles

Math. Program.

(2002)

M.A.T. Figueiredo et al.

An EM algorithm for wavelet-based image restoration

IEEE Trans. Image Process.

(2003)

M.A.T. Figueiredo et al.

Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems

IEEE J. Sel. Top. Signal Process.

(2007)

K. Fountoulakis et al.

A second-order method for strongly convex $ℓ_{1}$ -regularization problems

Math. Program., Ser. A

(2016)

Cited by (1)

An inexact quasi-Newton algorithm for large-scale ℓ<inf>1</inf> optimization with box constraints
2023, Applied Numerical Mathematics
In this paper, we develop an inexact quasi-Newton algorithm for $ℓ_{1}$ -regularization optimization problems subject to box constraints. The algorithm uses the identification technique of the proximal gradient algorithm to estimate the active set and free variables. To accelerate the convergence, we utilize the inexact quasi-Newton algorithm to update free variables. Under certain conditions, we show that the sequence generated by the algorithm converges R-linearly to a first-order optimality point of the problem. Moreover, the corresponding sequence of objective function values is also linearly convergent. Experiment results demonstrate the competitiveness of the proposed algorithm.

^☆: Supported by the Chinese NSF Grant (nos. 11971106, 11371154, 11331012 and 81173633), the Key Project of Chinese National Programs for Fundamental Research and Development (no. 2015CB856002), the China National Funds for Distinguished Young Scientists (no. 11125107), by the Ministry of Education, Humanities and Social Sciences project (no. 17JYJAZH011) and by the Natural Science Foundation of Guangdong Province (2018A030313229).

View full text

An active set Newton-CG method for ℓ1 optimization☆

Abstract

Introduction

Section snippets

Notation and properties

Active set estimate of ISTA

The new algorithm

Numerical experiments

Conclusion

Acknowledgements

Appl. Comput. Harmon. Anal.

Fast image recovery using variable splitting and constrained optimization

IEEE Trans. Image Process.

A first-order smoothed penalty method for compressed sensing

SIAM J. Optim.

Two point step size gradient methods

IMA J. Numer. Anal.

A fast iterative shrinkage-thresholding algorithm for linear inverse problems

SIAM J. Imaging Sci.

A new twist: two-step iterative shrinkage/thresholding algorithms for image restoration

IEEE Trans. Image Process.

Local linear convergence of the alternating direction method of multipliers on quadratic or linear program

SIAM J. Optim.

From sparse solutions of systems of equations to sparse modeling of signals and images

SIAM Rev.

A family of second-order methods for convex ℓ1-regularized optimization

Math. Program., Ser. A

An inexact successive quadratic approximation method for ℓ1 regularized optimization

Math. Program., Ser. B

Gradient-based method with active set strategy for ℓ1 optimization

Math. Comp.

A reduced-space algorithm for minimizing ℓ1-regularized convex functions

SIAM J. Optim.

Trust-region methods

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

Comm. Pure Appl. Math.

Compressed sensing

IEEE Trans. Inform. Theory

Benchmarking optimization software with performance profiles

Math. Program.

An EM algorithm for wavelet-based image restoration

IEEE Trans. Image Process.

Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems

IEEE J. Sel. Top. Signal Process.

A second-order method for strongly convex ℓ1-regularization problems

Math. Program., Ser. A

An active set Newton-CG method for ℓ₁ optimization☆

A family of second-order methods for convex $ℓ_{1}$ -regularized optimization

An inexact successive quadratic approximation method for $ℓ_{1}$ regularized optimization

Gradient-based method with active set strategy for $ℓ_{1}$ optimization

A reduced-space algorithm for minimizing $ℓ_{1}$ -regularized convex functions

A second-order method for strongly convex $ℓ_{1}$ -regularization problems