Tractable ADMM schemes for computing KKT points and local minimizers for $$\ell _0$$ -minimization problems

Xie, Yue; Shanbhag, Uday V.

doi:10.1007/s10589-020-00227-6

Tractable ADMM schemes for computing KKT points and local minimizers for $\ell _0$-minimization problems

Published: 01 October 2020

Volume 78, pages 43–85, (2021)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

670 Accesses
3 Citations
Explore all metrics

Abstract

We consider an $\ell _0$-minimization problem where $f(x) + \gamma \Vert x\Vert _0$ is minimized over a polyhedral set and the $\ell _0$-norm regularizer implicitly emphasizes the sparsity of the solution. Such a setting captures a range of problems in image processing and statistical learning. Given the nonconvex and discontinuous nature of this norm, convex regularizers as substitutes are often employed and studied, but less is known about directly solving the $\ell _0$-minimization problem. Inspired by Feng et al. (Pac J Optim 14:273–305, 2018), we consider resolving an equivalent formulation of the $\ell _0$-minimization problem as a mathematical program with complementarity constraints (MPCC) and make the following contributions towards the characterization and computation of its KKT points: (i) First, we show that feasible points of this formulation satisfy the relatively weak Guignard constraint qualification. Furthermore, if f is convex, an equivalence is derived between first-order KKT points and local minimizers of the MPCC formulation. (ii) Next, we apply two alternating direction method of multiplier (ADMM) algorithms, named (ADMM$_{\mathrm{cf}}^{\mu , \alpha , \rho }$) and (ADMM$_{\mathrm{cf}}$), to exploit the special structure of the MPCC formulation. Both schemes feature tractable subproblems. Specifically, in spite of the overall nonconvexity, it is shown that the first update can be effectively reduced to a closed-form expression by recognizing a hidden convexity property while the second necessitates solving a tractable convex program. In (ADMM$_{\mathrm{cf}}^{\mu , \alpha , \rho }$), subsequential convergence to a perturbed KKT point under mild assumptions is proved. Preliminary numerical experiments suggest that the proposed tractable ADMM schemes are more scalable than their standard counterpart while (ADMM$_{\mathrm{cf}}$) compares well with its competitors in solving the $\ell _0$-minimization problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

On the Improved Conditions for Some Primal-Dual Algorithms

Article 02 May 2024

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Article 07 June 2018

Notes

By saying that an optimization problem is tractable we mean that it either has a closed-form solution or lies in the range of convex programs that are polynomially solvable. We refer the readers to [4] for detailed discussion.
All experiments are conducted on Matlab and the code is uploaded to https://github.com/yue-xie/l0-minimization.

References

Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Article MathSciNet Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4, 1–106 (2012)
Article Google Scholar
Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23, 1480–1509 (2013)
Article MathSciNet Google Scholar
Ben-Tal, A., Nemirovski, A.: Computational tractability of convex programs. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2. SIAM, Philadelphia (2001)
Chapter Google Scholar
Ben-Tal, A., Teboulle, M.: Hidden convexity in some nonconvex quadratically constrained quadratic programming. Math. Program. 72, 51–63 (1996)
MathSciNet Google Scholar
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44, 813–852 (2016)
Article MathSciNet Google Scholar
Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43, 1–22 (2009)
Article MathSciNet Google Scholar
Birgin, E.G., Floudas, C.A., Martínez, J.M.: Global minimization using an augmented Lagrangian method with variable lower-level constraints. Math. Program. 125, 139–162 (2010)
Article MathSciNet Google Scholar
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)
Article Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)
Article MathSciNet Google Scholar
Boţ, R., Csetnek, E., Nguyen, D.: A proximal minimization algorithm for structured nonconvex and nonsmooth problems. SIAM J. Optim. 29, 1300–1328 (2019)
Article MathSciNet Google Scholar
Burdakov, O.P., Kanzow, C., Schwartz, A.: Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM J. Optim. 26, 397–425 (2016)
Article MathSciNet Google Scholar
Burke, J.: Fundamentals of optimization, Chapter 5, Langrange multipliers. Course Notes, AMath/Math 515, University of Washington
Burke, J.: Numerical optimization. Course Notes, AMath/Math 516, University of Washington, Spring Term (2012)
Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25, 21–30 (2008)
Article Google Scholar
Dong, H., Ahn, M., Pang, J.-S.: Structural properties of affine sparsity constraints. Math. Program. 176, 95–135 (2019)
Article MathSciNet Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
Article MathSciNet Google Scholar
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. I. Springer, Berlin (2007)
Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet Google Scholar
Fang, E.X., Liu, H., Wang, M.: Blessing of massive scale: spatial graphical model estimation with a total cardinality constraint approach. Math. Program. 176, 175–205 (2019)
Article MathSciNet Google Scholar
Feng, M., Mitchell, J.E., Pang, J.-S., Shen, X., Wächter, A.: Complementarity formulations of $\ell _0$-norm optimization problems. Pac. J. Optim. 14, 273–305 (2018)
MathSciNet Google Scholar
Fung, G., Mangasarian, O.: Equivalence of minimal $\ell _0$ and $\ell _p$ norm solutions of linear equalities, inequalities and linear programs for sufficiently small p. J. Optim. Theory Appl. 151, 1–10 (2011)
Article MathSciNet Google Scholar
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of ${L}_p$ minimization. Math. Program. 129, 285–299 (2011)
Article MathSciNet Google Scholar
Gonçalves, M.L., Melo, J.G., Monteiro, R.D.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). arXiv:1702.01850
Hajinezhad, D., Hong, M.: Perturbed proximal primal-dual algorithm for nonconvex nonsmooth optimization. Math. Program. 176, 207–245 (2019)
Article MathSciNet Google Scholar
Hong, M., Luo, Z., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26, 337–364 (2016)
Article MathSciNet Google Scholar
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72, 115–157 (2019)
Article MathSciNet Google Scholar
Liu, H., Yao, T., Li, R.: Global solutions to folded concave penalized nonconvex learning. Ann. Stat. 44, 629 (2016)
Article MathSciNet Google Scholar
Liu, Q., Shen, X., Gu, Y.: Linearized ADMM for nonconvex nonsmooth optimization with convergence analysis. IEEE Access 7, 76131–76144 (2019)
Article Google Scholar
Luo, Z.-Q., Pang, J.-S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)
Book Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
MathSciNet Google Scholar
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84, 497–540 (1996)
Article MathSciNet Google Scholar
Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. Sci. China Inf. Sci. 61, 122101 (2018)
Article MathSciNet Google Scholar
Wang, J., Zhao, L.: Nonconvex generalizations of ADMM for nonlinear equality constrained problems. CoRR (2017). arXiv:1705.03412
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63 (2018)
Article MathSciNet Google Scholar
Xu, Z., De, S., Figueiredo, M.A.T., Studer, C., Goldstein, T.: An empirical study of ADMM for nonconvex problems. CoRR (2016). arXiv:1612.03349
Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10, 74–110 (2017)
Article MathSciNet Google Scholar
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MathSciNet Google Scholar
Zhang, C.-H., Zhang, T.: A general theory of concave regularization for high-dimensional sparse estimation problems. Stat. Sci. 27, 576–593 (2012)
Article MathSciNet Google Scholar
Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11, 1081–1107 (2010)
MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge an early discussion with Dr. Ankur Kulkarni of IIT, Mumbai, as well as the inspiration provided by Dr. J. S. Pang during his visit to Penn. State University, and suggestion by Dr. Mingyi Hong in INFORMS 2018, Denver.

Author information

Authors and Affiliations

Wisconsin Institute for Discovery, Madison, USA
Yue Xie
Pennsylvania State University, State College, USA
Uday V. Shanbhag

Authors

Yue Xie
View author publications
You can also search for this author in PubMed Google Scholar
Uday V. Shanbhag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Xie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 KŁ property and global convergence

In this subsection we present the missing proof of global convergence of the sequence generated by (ADMM$_{\mathrm{cf}}^{\mu , \alpha , \rho }$) under the assumption of KŁ property. In the end, we will discuss the cases when KŁ does hold for the Lyapunov function. First we introduce several concepts necessary for the discussion. More details of the math background could be found in [1, 11, 34].

Definition 6

(Kurdyka–Łojasiewicz (KŁ) property [1]) A proper lower semi-continuous function $\mathcal{L}: {\mathbb {R}}^{N} \rightarrow {\mathbb {R}}\cup \{+\infty \}$ has the KŁ property at ${{\bar{x}}} \in {\text{ dom }}(\partial \mathcal{L})$, if there exists $\eta \in (0,+\infty )$, a neighborhood U of ${{\bar{x}}}$, and a continuous concave function $\phi : [0,\eta ) \rightarrow {\mathbb {R}}_+$ such that the following hold: (i) $\phi (0)= 0$, and $\phi$ is continuously differentiable on $(0,\eta )$. For all $s \in (0,\eta )$, $\phi '(s) > 0$; (ii) For all x in $U \cap \{ x \in {\mathbb {R}}^{N}: \mathcal{L}({{\bar{x}}})< \mathcal{L}(x) < \mathcal{L}({{\bar{x}}}) + \eta ]$, the Kurdyka–Łojasiewicz (KŁ) inequality holds: $\phi '(\mathcal{L}(x) - \mathcal{L}({{\bar{x}}})) \mathrm{dist} (0,\partial \mathcal{L}(x)) \ge 1.$

Definition 7

(Semialgebraic function) A semialgebraic set $S \subseteq {\mathbb {R}}^n$ can be written as finite union of sets of the following form:

$$\begin{aligned} S \triangleq \{ x \in {\mathbb {R}}^n: p_i(x) = 0, q_i(x) < 0, i = 1,\ldots ,m \}, \end{aligned}$$

where $p_i$ and $q_i$ are real polynomial functions. A function $F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ +\infty \}$ is a semialgebraic function if and only if its graph $\{ (x;y) \in {\mathbb {R}}^n \times {\mathbb {R}}: y = F(x) \}$ is a semialgebraic subset in ${\mathbb {R}}^{n+1}$.

Remark 7

A semialgebraic function has the following properties: (i) If it is proper lower semi-continuous, then it satisfies the KŁ property with $\phi (s) = cs^{1-\theta }$ for some $\theta \in [0,1) \cap {\mathbb {Q}}$ and $c > 0$. (ii) Finite sums and products of semialgebraic functions are semialgebraic. See [1, Section 4.3] for more details.

Definition 8

(o-minimal structure [34]) An o-minimal structure on the real field $({\mathbb {R}}, +, \cdot )$ is a sequence ${\mathcal {G}}= ({\mathcal {G}}_n)_{n \in {\mathbb {N}}}$ such that:

(i)
${\mathcal {G}}_n$ is a boolean algebra of subsets in ${\mathbb {R}}^n$, i.e., ${\mathbb {R}}^n \in {\mathcal {G}}_n$ and if $A, B \in {\mathcal {G}}_n$, then $A \cap B$, $A \cup B$, ${\mathbb {R}}^n \setminus A$ are in ${\mathcal {G}}_n$.
(ii)
If $A \in {\mathcal {G}}_n$, then $A \times {\mathbb {R}}$ and ${\mathbb {R}}\times A$ are in ${\mathcal {G}}_{n+1}$.
(iii)
If $A \in {\mathcal {G}}_{n+1}$, then $\{ (x_1, \ldots , x_n) \in {\mathbb {R}}^n \mid (x_1, . . . , x_n, x_{n+1}) \in A\}$ is in ${\mathcal {G}}_n$.
(iv)
For i, j such that $1 \le i < j \le n$, $\{(x_1, \ldots , x_n) \in {\mathbb {R}}^n \mid x_i = x_j \}$ is in ${\mathcal {G}}_n$.
(v)
The graphs of addition and multiplication are in ${\mathcal {G}}_3$.
(vi)
${\mathcal {G}}_1$ consists exactly finite unions of intervals and singletons.

Remark 8

Given ${\mathcal {G}}$, if the graph of function $f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ + \infty \}$ belongs to ${\mathcal {G}}_{n+1}$, then f is called definable. Note that summation of two definable functions is definable, and composition of definable functions is definable.

Theorem 4

(Theorem 14 [1]) Any proper lower semicontinuous function $f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ +\infty \}$ which is definable in an o-minimal structure ${\mathcal {G}}$ has the Kurdyka–Łojasiewicz property at each point of $\mathrm {dom}\partial f$.

Next we prove the statement we make in Remark 5 (iii).

Lemma 10

Suppose that assumptions in Theorem 2hold. $(w_k;y_k;\lambda _k)$ is generated by (ADMM$_\mathrm{cf}^{\mu ,\alpha ,\rho }$) and denote $(w^*,y^*,\lambda ^*)$ as the limit point. Let

$$\begin{aligned}&{\mathcal {H}}_{\tau }(w,y,\lambda ) \\&\triangleq \tilde{{\mathcal {L}}}_{\rho ,\alpha }(w,y,\lambda ) + \mathrm{1l}_{Z_1}(w) + \mathrm{1l}_{Z_2}(y) + \frac{(1-\rho \alpha )\alpha }{2} \Vert \lambda \Vert ^2 + \frac{\rho \Vert w - y - \alpha \lambda \Vert ^2}{2(1-\rho \alpha )/\tau }. \end{aligned}$$

Suppose that ${\mathcal {H}}_{\tau }$ satisfies the KŁ property at $(w^*,y^*,\lambda ^*)$. Then $\{(w_k;y_k;\lambda _k)\}$ converges to $(w^*;y^*;\lambda ^*)$ globally.

Proof

Denote ${\mathcal {H}}^k \triangleq {\mathcal {H}}_{\tau }(w_k, y_k, \lambda _k)$. Then it can be verified that $P_{\tau }^k = {\mathcal {H}}^k$, $\forall k \ge 1$ ($P_{\tau }^k$ defined in (34)). Then by Lemma 8, for any $k \ge 1$,

$$\begin{aligned} {\mathcal {H}}^k - {\mathcal {H}}^{k+1}&\ge c_1(\nu ) \Vert w_{k+1} - w_k \Vert ^2 + c_2 \Vert y_{k+1} - y_k \Vert ^2 + c_3(\nu ) \Vert \lambda _{k+1} - \lambda _k \Vert ^2. \end{aligned}$$

(60)

By Theorem 2, we know that there exists a subsequence $\{ (w_{n_k}; y_{n_k}; \lambda _{n_k}) \}$ that converges to $(w^*; y^*; \lambda ^*)$ ($(w_{n_k}; y_{n_k}; \lambda _{n_k}) \in Z_1 \times Z_2 \times {\mathbb {R}}^n$). Therefore ${\mathcal {H}}^{n_k} \rightarrow {\mathcal {H}}^* \triangleq {\mathcal {H}}_{\tau }(w^*,y^*,\lambda ^*)$ as $k \rightarrow \infty$. By Assumption 2 and (60), we know that ${\mathcal {H}}^k \ge {\mathcal {H}}^{k+1}$, $\forall k \ge 1$. Therefore, by the monotonicity of ${\mathcal {H}}^k$, we have that ${\mathcal {H}}^k \downarrow {\mathcal {H}}^*$.

Denote $z_k \triangleq (w_k; y_k; \lambda _k)$ and $z^* \triangleq (w^*; y^*; \lambda ^*)$. By KŁ property, there exist neighbourhood ${\mathcal {U}}\supseteq B(z^*, r) \triangleq \{ z \in {\mathbb {R}}^{3n} \mid \Vert z - z^* \Vert < r \}$, $\eta > 0$ and concave continuous function $\phi : [0,\eta ) \rightarrow {\mathbb {R}}_+$ such that $\phi (0) = 0$, $\phi$ is continuously differentiable on $(0,\eta )$ and $\phi '(s) > 0$ on $(0,\eta )$. Moreover, for any $z \in {\mathcal {U}}\cap \{ {\mathcal {H}}^*< {\mathcal {H}}_{\tau }(z) < {\mathcal {H}}^* + \eta \}$,

$$\begin{aligned} \phi '({\mathcal {H}}_{\tau }(z) - {\mathcal {H}}^*) \mathrm{dist} ( 0, \partial {\mathcal {H}}_{\tau }(z) ) \ge 1. \end{aligned}$$

(61)

By subsequential convergence to $z^*$, $\Vert z_k - z_{k+1} \Vert \rightarrow 0$(Lemma 8(iii)), monotonicity of $\phi$ and the fact that ${\mathcal {H}}^k \downarrow {\mathcal {H}}^*$, there exists $K_0$ large enough such that (let $\varDelta z_{k+1} \triangleq z_{k+1} - z_k$)

$$\begin{aligned} \begin{aligned} \Vert z_{K_0} - z^* \Vert + \Vert \varDelta z_{K_0+1} \Vert< r/4, \ {\mathcal {H}}^{K_0+1} - {\mathcal {H}}^*< \eta , \\ \phi ({\mathcal {H}}^{K_0+1} - {\mathcal {H}}^*) < \frac{r C_{\mathrm{min}}}{ 2\sqrt{3} C_{\mathrm{max}} }, \end{aligned} \end{aligned}$$

(62)

where $C_{\min } \triangleq \min \{ c_1(\nu ), c_2, c_3(\nu ) \}$, $C_{\max } \triangleq \max \{ C(\rho , \alpha , \tau ), \rho , \mu /2 \}$, $C(\rho , \alpha , \tau ) \triangleq 2(1-\rho \alpha + \tau ) + | (1-\rho \alpha )^2/\rho - \tau \alpha |$. WLOG let $K_0 = 0$. Then $z_0, z_1 \in B(z^*, r)$ and $\Vert \varDelta z_1 \Vert < r$. Suppose that for any $k = 1, \ldots , K$, $K \ge 1$, $z_k \in B(z^*, r)$, and $\sum _{k=1}^K \Vert \varDelta z_k \Vert < r$. We want to show that the same is true when $k = K+1$.

Note that for any $k \ge 1$,

$$\begin{aligned}&\partial {\mathcal {H}}_{\tau }(w_k,y_k,\lambda _k) = \partial ( \mathrm{1l}_{Z_1}(w_k) + \mathrm{1l}_{Z_2}(y_k) ) \nonumber \\&+ \begin{pmatrix} \nabla h(w_k) + (1-\rho \alpha ) \lambda _k + \rho ( w_k - y_k ) + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \nonumber \\ \nabla p(y_k) - (1-\rho \alpha ) \lambda _k - \rho (w_k - y_k) - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k) \nonumber \\ (1-\rho \alpha ) ( w_k - y_k - 2\alpha \lambda _k) + (1-\rho \alpha )\alpha \lambda _k + \frac{\tau \rho }{1-\rho \alpha }(w_k - y_k - \alpha \lambda _k)(-\alpha ) \end{pmatrix} \nonumber \\&= \begin{pmatrix} \partial \mathrm{1l}_{Z_1}(w_k) + \nabla h(w_k) + (1-\rho \alpha ) \lambda _k + \rho ( w_k - y_k ) + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \\ \partial \mathrm{1l}_{Z_2}(y_k) + \nabla p(y_k) - (1-\rho \alpha ) \lambda _k - \rho (w_k - y_k) - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k)\\ (1-\rho \alpha - \frac{\tau \rho \alpha }{1-\rho \alpha } ) (w_k - y_k - \alpha \lambda _k) \end{pmatrix} \end{aligned}$$

(63)

where the first equation holds because of differentiability of the smooth part of ${\mathcal {H}}_{\tau }$ and property (ii) after Definition 4. The second equation is implied by the subdifferential calculus for separable functions [32, Proposition 10.5, p. 426].

By the optimality conditions of Update-1 and Update-2 of (ADMM$_\mathrm{cf}^{\mu ,\alpha ,\rho }$), for any $k \ge 1$, there exist $u_k \in \partial \mathrm{1l}_{Z_1}(w_k)$, $v_k \in \partial \mathrm{1l}_{Z_2}(y_k)$ such that

$$\begin{aligned} \begin{aligned} -u_k&= \nabla h(w_k) + (1-\rho \alpha )\lambda _{k-1} + \rho (w_k - y_{k-1}) + \frac{\mu }{2}(w_k - w_{k-1}) \\ -v_k&= \nabla p(y_k) - (1-\rho \alpha ) \lambda _{k-1} - \rho (w_k - y_k) \end{aligned} \end{aligned}$$

(64)

Denote $\varDelta w_k \triangleq w_k - w_{k-1}$, $\varDelta y_k \triangleq y_k - y_{k-1}$, $\varDelta \lambda _k \triangleq \lambda _k - \lambda _{k-1}$. Then for any $k \ge 1$,

$$\begin{aligned}&\mathrm{dist}( 0, \partial {\mathcal {H}}_{\tau } ( z_k ) ) \nonumber \\&\overset{ (63) }{\le } \left\| \begin{pmatrix} u_k + \nabla h(w_k) + (1-\rho \alpha ) \lambda _k + \rho ( w_k - y_k ) + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \\ v_k + \nabla p(y_k) - (1-\rho \alpha ) \lambda _k - \rho (w_k - y_k) - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k) \\ (1-\rho \alpha - \frac{\tau \rho \alpha }{1-\rho \alpha } ) (w_k - y_k - \alpha \lambda _k) \end{pmatrix} \right\| \nonumber \\&\overset{ (64) }{ = } \left\| \begin{pmatrix} (1-\rho \alpha ) \varDelta \lambda _k - \rho \varDelta y_k - \frac{\mu }{2} \varDelta w_k + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \\ - (1-\rho \alpha ) \varDelta \lambda _k - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k) \nonumber \\ (1-\rho \alpha - \frac{\tau \rho \alpha }{1-\rho \alpha } ) (w_k - y_k - \alpha \lambda _k) \end{pmatrix} \right\| \nonumber \\&= \left\| \begin{pmatrix} (1-\rho \alpha + \tau ) \varDelta \lambda _k - \rho \varDelta y_k - \frac{\mu }{2} \varDelta w_k \\ - (1-\rho \alpha + \tau ) \varDelta \lambda _k \\ ( (1-\rho \alpha )^2/\rho - \tau \alpha ) \varDelta \lambda _k \end{pmatrix} \right\| \nonumber \\&\le \left\| (1-\rho \alpha + \tau ) \varDelta \lambda _k - \rho \varDelta y_k - \frac{\mu }{2} \varDelta w_k \right\| + \Vert (1-\rho \alpha + \tau ) \varDelta \lambda _k \Vert \nonumber \\&\quad + \Vert ( (1-\rho \alpha )^2/\rho - \tau \alpha ) \varDelta \lambda _k \Vert \nonumber \\&\le \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert . \end{aligned}$$

(65)

For any $k = 1,\ldots ,K$, suppose that ${\mathcal {H}}^k > {\mathcal {H}}^*$. Otherwise there exists ${{\bar{k}}}$ such that ${\mathcal {H}}^{{{\bar{k}}}} = {\mathcal {H}}^*$. Together with (60) and $c_1(\nu ), c_2, c_3(\nu ) > 0$, this implies that $z_{k+1} = z_k = z^*$, $\forall k \ge {{\bar{k}}}$, i.e., $z_k$ converges to $z^*$ already. Then by ${\mathcal {H}}^k \le {\mathcal {H}}^1 < {\mathcal {H}}^* + \eta$ from (62) and the hypothesis $z_k \in B(z^*,r)$, (61) holds at $z = z_k$.

Also, by concavity of $\phi$ and the fact that ${\mathcal {H}}^*< {\mathcal {H}}^k \le {\mathcal {H}}^1 < \eta$, we have

$$\begin{aligned} 0 \le \phi '({\mathcal {H}}^k - {\mathcal {H}}^*)( {\mathcal {H}}^k - {\mathcal {H}}^{k+1} ) \le \phi ( {\mathcal {H}}^k - {\mathcal {H}}^* ) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*). \end{aligned}$$

(66)

Therefore, by (65), (66) and KŁ inequality, we have the following:

$$\begin{aligned}&(\phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*)) \left( \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert \right) \nonumber \\&\ge {\mathcal {H}}^k - {\mathcal {H}}^{k+1} \overset{ (60) }{ \ge } c_1(\nu ) \Vert \varDelta w_{k+1} \Vert ^2 + c_2 \Vert \varDelta y_{k+1} \Vert ^2 + c_3(\nu ) \Vert \varDelta \lambda _{k+1} \Vert ^2 \nonumber \\&\implies \sqrt{ c_1(\nu ) \Vert \varDelta w_{k+1} \Vert ^2 + \frac{\rho }{2} \Vert \varDelta y_{k+1} \Vert ^2 + c_3(\nu ) \Vert \varDelta \lambda _{k+1} \Vert ^2 } \nonumber \\&\le \sqrt{ \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) } \cdot \sqrt{ \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert } \nonumber \\&\overset{ \forall M > 0 }{\implies } \sqrt{ C_{\min } } \Vert \varDelta z_{k+1} \Vert \le \frac{M}{2} ( \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) ) \nonumber \\&\quad + \frac{1}{2M} \left( \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert \right) \nonumber \\&\le \frac{M}{2} ( \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) ) + \frac{C_{\max }}{2M} \left( \Vert \varDelta w_k \Vert + \Vert \varDelta y_k \Vert + \Vert \varDelta \lambda _k \Vert \right) \nonumber \\&\le \frac{M}{2} ( \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) ) + \frac{\sqrt{3} C_{\max }}{2M} \Vert \varDelta z_k \Vert \end{aligned}$$

(67)

The last inequality holds because $( \Vert \varDelta w_k \Vert + \Vert \varDelta y_k \Vert + \Vert \varDelta \lambda _k \Vert )^2 \le 3 ( \Vert \varDelta w_k \Vert ^2 + \Vert \varDelta y_k \Vert ^2 + \Vert \varDelta \lambda _k \Vert ^2 ) = 3 \Vert \varDelta z_k \Vert ^2$. Sum up (67) from $k = 1$ to K and we have:

$$\begin{aligned}&\sqrt{C_{\min }} \sum _{k=1}^K \Vert \varDelta z_{k+1} \Vert \nonumber \\&\le \frac{M}{2}( \phi ({\mathcal {H}}^1 - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{K+1} - {\mathcal {H}}^*) ) + \frac{\sqrt{3} C_{\max }}{2M} \sum _{k=1}^K \Vert \varDelta z_k \Vert \nonumber \\&\le \frac{M}{2} \phi ({\mathcal {H}}^1 - {\mathcal {H}}^*) + \frac{\sqrt{3} C_{\max }}{2M} \sum _{k=1}^K \Vert \varDelta z_k \Vert \nonumber \\ \implies&\sum _{k=0}^K \Vert \varDelta z_{k+1} \Vert \le \frac{M}{2 \sqrt{C_{\min }} } \phi ({\mathcal {H}}^1 - {\mathcal {H}}^*) + \frac{\sqrt{3} C_{\max }}{2M \sqrt{C_{\min }} } \sum _{k=1}^K \Vert \varDelta z_k \Vert + \Vert \varDelta z_1 \Vert \end{aligned}$$

(68)

Let $M = \frac{\sqrt{3} C_{\max }}{ \sqrt{C_{\min }} }$ in (68) and use (62) and the hypothesis $\sum _{k=1}^K \Vert \varDelta z_k \Vert < r$, we have that

$$\begin{aligned} \sum _{k=1}^{K+1} \Vert \varDelta z_k \Vert< \frac{r}{4} + \frac{r}{2} + \frac{r}{4} = r, \ \Vert z_{K+1} - z^* \Vert \le \sum _{k=0}^K \Vert \varDelta z_{k+1} \Vert + \Vert z_0 - z^* \Vert < r. \end{aligned}$$

Therefore, the hypothesis is verified at $k = K+1$. By induction, $z_k \in B(z^*, r)$, $\sum _{i=1}^k \Vert \varDelta z_i \Vert < r$, $\forall k \ge 1$. Therefore sequence $\{ z_k \}$ is Cauchy and converges. $\square$

Remark 9

We introduce two general cases when ${\mathcal {H}}_\tau$ satisfies the KŁ property:

(i)
p(y) is a polynomial function. In this case, p(y) is semialgebraic (Definition 7). Therefore, $H_\tau$ is a sum of semialgebraic functions so itself is semialgebraic. Then the result follows from the fact that a semialgebraic function satisfies the KŁ property at every point in its domain [1]. Note that if we reformulate ($\ell _0\hbox {-LSR}$) in Sect. 5.1 as the structured program (33), then $p(y) \equiv 0$, which belongs to this case.
(ii)
${\mathcal {H}}_{\tau }$ is in ${\mathcal {G}}({\mathbb {R}}_{\mathrm{an, exp}})$. ${\mathcal {G}}({\mathbb {R}}_\mathrm{an, exp})$ is a type of o-minimal structure that contains the graphs of many function classes including semialgebraic functions, restricted analytic functions (an analytic function $f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}$ restricted to $[-1,1]^n$), $\exp : {\mathbb {R}}\rightarrow {\mathbb {R}}$ and $\log : (0,+\infty ) \rightarrow {\mathbb {R}}$ [34]. In particular, when g(x) in (1) is a logistic loss function, i.e.,
$$\begin{aligned} g(x) = \frac{1}{N} \sum _{i=1}^N \log ( 1 + \exp ( - l_i x^T s_i ) ), \end{aligned}$$
p(y) is definable w.r.t. ${\mathcal {G}}({\mathbb {R}}_{\mathrm{an, exp}})$ since the composition and summation of definable function is definable. Therefore, ${\mathcal {H}}_{\tau }$ is also definable since other summands of ${\mathcal {H}}_{\tau }$ are semialgebraic functions.
(iii)
Other types of functions such as uniformly convex functions, convex function that satisfies a growth condition and convex subanalytic functions may also satisfies the KŁ property, which is beyond of the scope of this paper. We refer the interested reader to [1, 10] for more details.

1.2 Miscellaneous

Lemma 11

(Theorem 10 [14]) In ${\mathbb {R}}^{n_1}$, let $C = \{ x \in X \mid F(x) \in D \}$, for closed convex sets $X \subset {\mathbb {R}}^{n_1}, D \subset {\mathbb {R}}^{n_2}$, and a ${\mathcal {C}}^1$ mapping $F: {\mathbb {R}}^{n_1} \rightarrow {\mathbb {R}}^{n_2}$, written componentwise as $F(x) = (f_1(x); \ldots ; f_{n_2}(x))$. Suppose the following constraint qualification is satisfied at a point ${{\bar{x}}} \in C$:

$$\begin{aligned} \sum _{j=1}^{n_2} y_j \nabla f_j({{\bar{x}}}) + z = 0, y = (y_1; \ldots ; y_{n_2}) \in {\mathcal {N}}_D(F({{\bar{x}}})), z \in {\mathcal {N}}_X({{\bar{x}}}) \\ \implies y = \mathbf{0}, z = 0. \end{aligned}$$

Then the normal cone ${\mathcal {N}}_C({{\bar{x}}})$ consists of all vectors v of the form

$$\begin{aligned} v = y_1 \nabla f_1({{\bar{x}}}) + \ldots + y_{n_2} \nabla f_{n_2}({{\bar{x}}}) + z {\text{ with }} y = (y_1;\ldots ;y_{n_2}) \in {\mathcal {N}}_D(F({{\bar{x}}})),\\ z \in {\mathcal {N}}_X({{\bar{x}}}). \end{aligned}$$

Note: When $X = {\mathbb {R}}^{n_1}$, the normal cone ${\mathcal {N}}_X({{\bar{x}}}) = \{0\}$, so the z terms here drop out. When D is a singleton, ${\mathcal {N}}_D(F({{\bar{x}}})) = {\mathbb {R}}^{n_2}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, Y., Shanbhag, U.V. Tractable ADMM schemes for computing KKT points and local minimizers for $\ell _0$-minimization problems. Comput Optim Appl 78, 43–85 (2021). https://doi.org/10.1007/s10589-020-00227-6

Download citation

Received: 18 December 2019
Accepted: 12 September 2020
Published: 01 October 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10589-020-00227-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tractable ADMM schemes for computing KKT points and local minimizers for \(\ell _0\)-minimization problems

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

On the Improved Conditions for Some Primal-Dual Algorithms

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 KŁ property and global convergence

Definition 6

Definition 7

Remark 7

Definition 8

Remark 8

Theorem 4

Lemma 10

Proof

Remark 9

1.2 Miscellaneous

Lemma 11

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tractable ADMM schemes for computing KKT points and local minimizers for \(\ell _0\)-minimization problems

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

On the Improved Conditions for Some Primal-Dual Algorithms

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 KŁ property and global convergence

Definition 6

Definition 7

Remark 7

Definition 8

Remark 8

Theorem 4

Lemma 10

Proof

Remark 9

1.2 Miscellaneous

Lemma 11

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation