Skip to main content
Log in

Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

A Correction to this article was published on 06 May 2021

This article has been updated

Abstract

This paper focuses on a class of zero-norm composite optimization problems. For this class of nonconvex nonsmooth problems, we establish the Kurdyka–Łojasiewicz property of exponent being a half for its objective function under a suitable assumption and provide some examples to illustrate that such an assumption is not very restricted which, in particular, involve the zero-norm regularized or constrained piecewise linear–quadratic function, the zero-norm regularized or constrained logistic regression function, the zero-norm regularized or constrained quadratic function over a sphere.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Change history

References

  1. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizeations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    Article  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  3. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and reguarlized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)

    Article  MathSciNet  Google Scholar 

  4. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    Article  MathSciNet  Google Scholar 

  5. Pan, S.H., Liu, Y.L.: Metric subregularity of subdifferential and KL property of exponent 1/2. arXiv:1812.00558v3(2019)

  6. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)

    Article  MathSciNet  Google Scholar 

  7. Wang, X.F., Ye, J.J., Yuan, X.M., Zeng, S.Z., Zhang, J.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis. arXiv:1810.10051(2018)

  8. Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferential. J. Convex Anal. 15, 365–380 (2008)

    MathSciNet  MATH  Google Scholar 

  9. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)

    Article  MathSciNet  Google Scholar 

  10. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 1, 43–54 (1992)

    Article  MathSciNet  Google Scholar 

  11. Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)

    Article  MathSciNet  Google Scholar 

  12. Zhou, Z.R., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)

    Article  MathSciNet  Google Scholar 

  13. Cui, Y., Sun, D.F., Toh, K.C.: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1300-6

    Article  MATH  Google Scholar 

  14. D’Acunto, D., Kurdyka, D.: Explicit bounds for the Lojasiewicz exponent in the gradient inequality for polynomials. Ann. Polon. Math. 87, 51–61 (2005)

    Article  MathSciNet  Google Scholar 

  15. Li, G.Y., Mordukhovich, B.S., Phạm, T.S.: New fractional error bounds for polynomial systems with application to Holderian stability in optimization and spectral theory of tensors. Math. Program. 153(2), 333–362 (2015). (Ser. A)

    Article  MathSciNet  Google Scholar 

  16. Li, G.Y., Mordukhovich, B.S., Nghia, T.T.A., Phạm, T.S.: Error bounds for parametric polynomial systems with applications to higher-order stability analysis and convergence rates. Math. Program. 168(1–2), 313–346 (2018)

    Article  MathSciNet  Google Scholar 

  17. Li, G.Y., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)

    Article  MathSciNet  Google Scholar 

  18. Yu, P.R., Li, G.Y., Pong, T.K.: Deducing Kurdyka–Łojasiewicz exponent via inf-projection. arXiv:1902.03635 (2019)

  19. Liu, H.K., So, A.M.-C., Wu, W.J.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1285-1

    Article  MATH  Google Scholar 

  20. Zhang, Q., Chen, C.H., Liu, H.K., So, A.M.-C., Zhou, Z.R.: On the linear convergence of the ADMM for regularized non-convex low-rank matrix recovery. https://www1.se.cuhk.edu.hk/~manchoso/admm_MF.pdf

  21. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)

    Article  MathSciNet  Google Scholar 

  22. Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)

    MathSciNet  MATH  Google Scholar 

  23. Yuan, X.T., Zhang, T.: Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14, 899–925 (2013)

    MathSciNet  MATH  Google Scholar 

  24. Asteris, M., Papailiopoulos, D., Dimakis, A.: Nonnegative sparse PCA with provable guarantees. In: International Conference on Machine Learning (2014)

  25. Brodie, J., Daubechies, I., De Mol, C., Giannone, D., Loris, I.: Sparse and stable Markowitz portfolios. Proc. Natl. Acad. Sci. 106, 12267–12272 (2009)

    Article  Google Scholar 

  26. Zhang, J.Y., Liu, H.Y., Wen, Z.W., Zhang, S.Z.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40, A3091–A3120 (2017)

    Article  MathSciNet  Google Scholar 

  27. Rockafellar, R.T., Wets, R.J.: Variational analysis. Springer, New York (1998)

    Book  Google Scholar 

  28. Mordukhovich, B.S.: Variational Analysis and Applications. Springer, Cham (2018)

    Book  Google Scholar 

  29. Le, Y.H.: Generalized subdifferentials of the rank function. Optim. Lett. 7, 731–743 (2013)

    Article  MathSciNet  Google Scholar 

  30. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  31. Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.F.: Restricted normal cones and sparsity optimization with affine constraints. Found. Comput. Math. 14, 63–83 (2014)

    Article  MathSciNet  Google Scholar 

  32. Feng, X., Wu, C.L.: Every critical point of an \(l_0\) regularized minimization model is a local minimizer. arXiv:1912.04498 (2019)

  33. Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–214 (1981)

    Article  MathSciNet  Google Scholar 

  34. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer International Publishing (2016)

  35. Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand 49, 263–265 (1952)

    Article  MathSciNet  Google Scholar 

  36. Lemarechal, C.: Conver analysis and minimization algorithm I. Springer, New York (1991)

    Google Scholar 

  37. Sun, J.: On Monotropic Piecewise Quadratic Programming. Ph.D Thesis, Department of Mathematics, University of Washington, Seattle(1986)

  38. Ioffe, A.D., Outrata, J.V.: On metric and calmness qualification conditions in subdifferential calculus. Set-valued Anal. 16, 199–227 (2008)

    Article  MathSciNet  Google Scholar 

  39. Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to give their sincere thanks to two anonymous reviewers for their helpful comments, which improve greatly the original manuscript. The research of S. H. Pan and S. J. Bi is supported by the National Natural Science Foundation of China under Project No. 11971177 and No. 11701186, and Guangdong Basic and Applied Basic Research Foundation (2020A1515010408).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shujun Bi.

Additional information

Communicated by Boris S. Mordukhovich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 A: KL Property Relative to a Manifold

Let \({\mathcal {M}}\subset {\mathbb {R}}^p\) be a \({\mathcal {C}}^2\)-smooth manifold and \(f\!:{\mathcal {M}}\rightarrow {\mathbb {R}}\) be a \({\mathcal {C}}^2\)-smooth function. The set of critical points of the problem \( \min _{x\in {\mathcal {M}}}f(x) \) is \({\mathcal {X}}:=\big \{x\in {\mathcal {M}}:\ \nabla _{\!{\mathcal {M}}}f(x)=0\big \}\), where \(\nabla _{\!{\mathcal {M}}}f(z)\) is the projection of \(\nabla \!f(z)\) onto the tangent space \({\mathcal {T}}_{{\mathcal {M}}}(z)\) of \({\mathcal {M}}\) at z. We say that f is a KL function of exponent 1/2 relative to \({\mathcal {M}}\) if f has the KL property of exponent 1/2 at each \({\overline{x}}\in {\mathcal {X}}\), i.e., there exist \(\delta >0\) and \(\gamma >0\) such that

$$\begin{aligned} \Vert \nabla _{\!{\mathcal {M}}}f(z)\Vert \ge \gamma \sqrt{|f(z)-f({\overline{x}})|} \quad \ \forall z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}. \end{aligned}$$
(13)

This part states the relation between the KL property of exponent 1/2 of f relative to \({\mathcal {M}}\) and the KL property of exponent 1/2 for its extended \({\widetilde{f}}(x):=f(x)+\delta _{{\mathcal {M}}}(x)\) for \(x\in {\mathbb {R}}^p\).

Lemma A.1

Let \({\mathcal {M}}\!\subset \!{\mathbb {R}}^p\) be a \({\mathcal {C}}^2\)-smooth manifold and \(f\!:{\mathcal {M}}\rightarrow {\mathbb {R}}\) be a \({\mathcal {C}}^2\)-smooth function. If f is a KL function of exponent 1/2 relative to \({\mathcal {M}}\), then \({\widetilde{f}}\) is a KL function of exponent 1/2. Conversely, if \({\widetilde{f}}\) is a KL function of exponent 1/2 and each critical point is a local minimizer, then f is a KL function of exponent 1/2 relative to \({\mathcal {M}}\).

Proof

Notice that \(\partial \!{\widetilde{f}}(x)=\nabla \!f(x) +{\mathcal {N}}_{{\mathcal {M}}}(x)\) for any \(x\in {\mathcal {M}}\). Clearly, \({\mathcal {X}}=\mathrm{crit}{\widetilde{f}}\). Fix an arbitrary \({\overline{x}}\in {\mathcal {X}}\). Since f has the KL property of exponent 1/2 relative to \({\mathcal {M}}\) at \({\overline{x}}\), there exist \(\delta >0\) and \(\gamma >0\) such that (13) holds for all \(z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}\). Fix an arbitrary \(\eta >0\) and an arbitrary \(x\in {\mathbb {B}}({\overline{x}},\delta ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]\). Clearly, \(x\in {\mathcal {M}}\). Moreover,

$$\begin{aligned}&\mathrm{dist}(0,\partial \!{\widetilde{f}}(x)) =\Vert \nabla f(x)-\varPi _{{\mathcal {N}}_{{\mathcal {M}}}(x)}(\nabla \!f(x))\Vert \nonumber \\&\quad =\Vert \varPi _{{\mathcal {T}}_{{\mathcal {M}}}(x)}(\nabla \!f(x))\Vert =\Vert \nabla \!f_{{\mathcal {M}}}(x)\Vert . \end{aligned}$$
(14)

Along with (13), \(\mathrm{dist}(0,\partial \!{\widetilde{f}}(x))\ge \gamma \sqrt{f(x)-f({\overline{x}})}\). So, the first part of the results follows.

Next we focus on the second part. Fix an arbitrary \({\overline{x}}\in {\mathcal {X}}\). By the given assumption, clearly, \({\overline{x}}\) is a local optimal solution of \( \min _{x\in {\mathcal {M}}}f(x). \) Hence, there exists \(\varepsilon '>0\) such that

$$\begin{aligned} f(z)\ge f({\overline{x}})\quad \ \forall z\in {\mathbb {B}}({\overline{x}},\varepsilon ')\cap {\mathcal {M}}. \end{aligned}$$

By the KL property of exponent 1/2 of \({\widetilde{f}}\) at \({\overline{x}}\), there exist \(\varepsilon ,c>0\) and \(\eta >0\) such that

$$\begin{aligned} \mathrm{dist}(0,\partial \!{\widetilde{f}}(x))\ge c\sqrt{f(x)-{\widetilde{f}}(x)} \quad \forall x\in {\mathbb {B}}({\overline{x}},\varepsilon ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]. \end{aligned}$$
(15)

Since f is \({\mathcal {C}}^2\)-smooth around \({\overline{x}}\), there exists \(\varepsilon ''>0\) such that for all \(z\in {\mathbb {B}}({\overline{x}},\varepsilon '')\cap {\mathcal {M}}\), \( f(z)<f({\overline{x}})+\eta . \) Take \(\delta =\min (\varepsilon ,\varepsilon ',\varepsilon '')\). Fix an arbitrary \(x\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}\). Clearly, \( f({\overline{x}})\le f(x)\le f({\overline{x}})+\eta . \) If \(f(x)>f({\overline{x}})\), then \(x\in {\mathbb {B}}({\overline{x}},\varepsilon ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]\), and from (15) and (14), \( \Vert \nabla \!f_{{\mathcal {M}}}(x)\Vert \ge c\sqrt{|f(x)-f({\overline{x}})|}. \) If \(f(x)=f({\overline{x}})\), this inequality holds automatically. \(\square \)

1.2 B: KL Property of the Quadratic Function over a Sphere

For any integer \(m\ge 1\) and any given \(m\times m\) real symmetric H, define \(g(z):=z^{{\mathbb {T}}}Hz+\delta _{{\mathcal {S}}}(z)\) for \(z\in {\mathbb {R}}^m\). Lemma 1 in Appendix A and [19, Theorem 1] imply that g is a KL function of exponent 1/2. This part gives a different proof, which needs the following lemmas.

Lemma B.1

The critical point set of g takes the form of \(\mathrm{crit}g=\big \{z\in {\mathcal {S}}:\ Hz=\langle z,Hz\rangle z\big \}.\) So, by letting H have the eigenvalue decomposition \(P\varLambda P^{{\mathbb {T}}}\) with \(\varLambda =\mathrm{diag}(\lambda _1,\ldots ,\lambda _m)\) for \(\lambda _1\ge \cdots \ge \lambda _m\) and \(P\in {\mathbb {O}}^m\), \(\mathrm{crit}g =PW\) with \( W=\big \{y\in {\mathcal {S}}:\ \varLambda y=\langle y,\varLambda y\rangle y\big \}.\)

Proof

By [27, Exercise 8.8] and Lemma 3.1, it immediately follows that for any \(z\in {\mathbb {R}}^m\),

$$\begin{aligned} \partial g(z)=2Hz+\partial \delta _{{\mathcal {S}}}(z)=2Hz+[\![z]\!]. \end{aligned}$$
(16)

Choose an arbitrary \({\overline{z}}\in \mathrm{crit}g\). From (16), there exists \({\overline{t}}\in {\mathbb {R}}\) such that \(0=2H{\overline{z}}+{\overline{t}}{\overline{z}}\). Along with \(\Vert {\overline{z}}\Vert =1\), we have \({\overline{t}}=-2\langle {\overline{z}},H{\overline{z}}\rangle \), and hence \({\overline{z}}\in \!\big \{z\in {\mathcal {S}}:\ Hz=\!\langle z,Hz\rangle z\big \}\). Consequently, \( \mathrm{crit}g\subseteq \big \{z\in {\mathcal {S}}:\ Hz=\langle z,Hz\rangle z\big \}. \) The converse inclusion is immediate to check by Lemma 3.1. Thus, the first part follows. The second part is immediate. \(\square \)

Lemma B.2

Let \(D=\mathrm{diag}(d_1,d_2,\ldots ,d_p)\) with \(d_1\ge d_2\ge \cdots \ge d_p\). Define the function \(\psi (x):=x^{{\mathbb {T}}}Dx+\delta _S(x)\) for \(x\in {\mathbb {R}}^p\). Then, \(\psi \) is a KL function of exponent 1/2.

Proof

By Lemma B.1, it is immediate to obtain the following characterization for \(\mathrm{crit}\psi \):

$$\begin{aligned} \mathrm{crit}\,\psi =\big \{x\in {\mathcal {S}}:\ Dx=\langle x,Dx\rangle x\big \}. \end{aligned}$$
(17)

Clearly, for each \(x\in \mathrm{crit}\,\psi \), \(d_i=\langle x,Dx\rangle \) with \(i\in \mathrm{supp}(x)\). For any \(z\in \mathrm{dom}\,\partial \psi \), we have

$$\begin{aligned} \begin{aligned} \text{ dist }(0,\partial \psi (z))^2&= \min _{u\in \partial \psi (z)} \Vert u\Vert ^2= \min _{w\in {\mathbb {R}}} \Vert 2Dz + wz\Vert ^2\\&=\min _{w\in {\mathbb {R}}}\Big \{4\langle z,D^{{\mathbb {T}}}Dz\rangle +w^2+4w\langle z,Dz\rangle \Big \}\\&= 4\langle z,D^{{\mathbb {T}}}Dz\rangle -4(\langle z,Dz\rangle )^2 =4\Vert Dz-\langle z,Dz\rangle z\Vert ^2. \end{aligned} \end{aligned}$$
(18)

Now fix an arbitrary \({\overline{x}}\in \mathrm{crit}\,\psi \). From (17), it immediately follows that \( -D{\overline{x}}+\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}}=0. \) We next proceed the arguments by two cases as will be shown below.

Case 1: \(d_1=\cdots =d_p=\gamma \) for some \(\gamma \in {\mathbb {R}}\). Choose an arbitrary \(\eta >0\) and an arbitrary \(\delta >0\). Fix an arbitrary \(x \in {\mathbb {B}}({\overline{x}},\delta )\cap [\psi ({\overline{x}})<\psi (x) < \psi ({\overline{x}})+\eta ]\). Clearly, \(x\in {\mathcal {S}}\) and \(\langle x,Dx\rangle =\gamma \). Combining \(\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}}=D{\overline{x}}\) and Eq. (18) yields that

$$\begin{aligned} \text{ dist }(0, \partial \psi (x)) =4\Vert Dx-\langle x,Dx\rangle x-(D{\overline{x}}-\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}})\Vert =0. \end{aligned}$$

In addition, \(\psi (x)=\psi ({\overline{x}})=\gamma \). This means that \( \text{ dist }(0, \partial \psi (x))=\sqrt{\psi (x)-\psi ({\overline{x}})}. \)

Case 2: there exist \(i\ne j\in \{1,2,\ldots ,p\}\) such that \(d_i\ne d_j\). Write \(J=\mathrm{supp}({\overline{x}})\) and \({\overline{J}}=\{1,\ldots ,p\}\backslash J\). By (17), we know that \(d_i=\langle {\overline{x}},D{\overline{x}}\rangle \) for all \(i\in J\). This means that there must exist an index \(\kappa \in {\overline{J}}\) such that \(d_{\kappa }\ne \langle {\overline{x}},D{\overline{x}}\rangle \). Write \( {\overline{J}}_1:=\big \{i\in {\overline{J}}:\ d_i\ne \langle {\overline{x}},D{\overline{x}}\rangle \big \}. \) By the continuity of the function \(\langle \cdot ,D\cdot \rangle \), there exists \(\delta >0\) such that for all \(z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {S}}\),

$$\begin{aligned} \frac{1}{2}|d_j-\langle {\overline{x}},D{\overline{x}} \rangle |\le |d_j-\langle z,Dz\rangle |\le \frac{3}{2}|d_j-\langle {\overline{x}},D{\overline{x}}\rangle |\quad \forall j\in {\overline{J}}_1. \end{aligned}$$
(19)

Choose an arbitrary \(\eta >0\). Fix an arbitrary \(x\in {\mathbb {B}}({\overline{x}},\delta ) \cap [\psi ({\overline{x}})<\psi (x)<\psi ({\overline{x}})+\eta ]\). Clearly, \(x \in {\mathcal {S}}\). From Eq. (18), it follows that

$$\begin{aligned} \frac{1}{4}\text{ dist }(0,\partial \psi (x))^2&=\sum _{j\in {\overline{J}}}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2 +\sum _{j\in J}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2\nonumber \\&=\sum _{j\in {\overline{J}}}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2 +\sum _{j\in J}\big (\langle {\overline{x}},D{\overline{x}}\rangle -\langle x,Dx\rangle \big )^2x_j^2\nonumber \\&\ge \sum _{j\in {\overline{J}}_1}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2 \ge \frac{1}{4}\sum _{j\in {\overline{J}}_1}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )^2x_j^2 \end{aligned}$$
(20)

where the third equality is due to (17), the first inequality is by the definition of \({\overline{J}}_1\), and the last inequality is due to (19). On the other hand, by the definition of \(\psi \),

$$\begin{aligned} \psi (x)-\psi ({\overline{x}})&=\langle x,Dx\rangle -\langle {\overline{x}},D{\overline{x}}\rangle =\sum _{j\in {\overline{J}}}d_jx_j^2+\sum _{j\in J}d_jx_j^2-\langle {\overline{x}},D{\overline{x}}\rangle \Vert x\Vert ^2\nonumber \\&=\sum _{j\in {\overline{J}}}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2 +\sum _{j\in J}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2\nonumber \\&=\sum _{j\in {\overline{J}}}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2 =\sum _{j\in {\overline{J}}_1}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2\nonumber \\&\le \sum _{j\in {\overline{J}}_1}|\langle {\overline{x}},D{\overline{x}}\rangle -d_j| x_j^2 \le \max _{j\in {\overline{J}}_1}|d_j-\langle {\overline{x}},D{\overline{x}}\rangle |\Vert x_{{\overline{J}}_1}\Vert ^2 \end{aligned}$$
(21)

where the fourth equality is due to (17), the fifth one is by the definition of \({\overline{J}}_1\), and the inequality is since \(\psi (x)-\psi ({\overline{x}})>0\). From the above inequalities (20) and (21),

$$\begin{aligned} \text{ dist }(0, \partial \psi (x))&\ge \sqrt{\sum _{j\in {\overline{J}}_1} \big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )^2x_j^2} \ge \min _{j\in {\overline{J}}_1}|d_j-\langle {\overline{x}}, D{\overline{x}}\rangle |\Vert x_{{\overline{J}}_1}\Vert \\&\ge \frac{\min _{j\in {\overline{J}}_1}|d_j- \langle {\overline{x}},D{\overline{x}}\rangle |}{\sqrt{\max _{j\in {\overline{J}}_1}|d_j- \langle {\overline{x}},D{\overline{x}}\rangle |}} \sqrt{\psi (x)-\psi ({\overline{x}})}. \end{aligned}$$

By the arbitrariness of x, Cases 1 and 2 show that \(\psi \) has the KL property with exponent 1/2 at \({\overline{x}}\). From the arbitrariness of \({\overline{x}}\) in \(\mathrm{crit}\psi \), \(\psi \) is a KL function of exponent 1/2. \(\square \)

Now we prove that g is a KL function of exponent 1/2. Fix an arbitrary \({\overline{z}}\in \mathrm{crit}g\). Let H have the eigenvalue decomposition as in Lemma B.1. Then, \({\overline{y}}=P^{{\mathbb {T}}}{\overline{z}}\in \mathrm{crit}\psi \) where \(\psi \) is defined in Lemma B.2 with \(D=\varLambda \). By Lemma B.2, there exist \(\eta>0,\delta >0\) and \(c>0\) such that

$$\begin{aligned} \mathrm{dist}(0,\partial \psi (y))\ge c\sqrt{\psi (y)-\psi ({\overline{y}})} \quad \ \forall y\in {\mathbb {B}}({\overline{y}},\delta ) \cap [\psi ({\overline{y}})<\psi <\psi ({\overline{y}})+\eta ]. \end{aligned}$$

Fix an arbitrary \(z\in {\mathbb {B}}({\overline{z}},\delta )\cap [g({\overline{z}})<g<g({\overline{z}})+\eta ]\). Clearly, \(z\in {\mathcal {S}}\). Write \(y=P^{{\mathbb {T}}}z\). Then, \(y\in {\mathcal {S}}\) and \(g(z)=\psi (y)\). Since \(g({\overline{z}})=g({\overline{y}})\), \( y\in {\mathbb {B}}({\overline{y}},\delta )\cap [\psi ({\overline{y}})<\psi (y) < \psi ({\overline{y}})+\eta ]. \) In addition, from (16) and the eigenvalue decomposition of H, \(\partial g(z)=P\partial \psi (y)\). Thus,

$$\begin{aligned} \mathrm{dist}(0,\partial g(z))=\mathrm{dist}(0,P\partial \psi (y)) =\mathrm{dist}(0,\partial \psi (y))\ge c\sqrt{\psi (y)-\psi ({\overline{y}})}. \end{aligned}$$

Together with \(\psi (y)-\psi ({\overline{y}})=g(z)-g({\overline{z}})\), it follows that g has the KL property with exponent of 1/2 at \({\overline{z}}\). By the arbitrariness of \({\overline{z}}\) in \(\mathrm{crit}g\), g is a KL function of exponent 1/2.

1.3 C: Supplementary Lemma and Proofs

The following lemma extends the result of [34, Section 2.3] for the differentiable strongly convex function to the setting of closed proper strongly convex functions. In particular, it implies that the composite g is a KL function of exponent 1/2 without surjectivity of \({\mathcal {A}}\).

Lemma C.1

Consider \(g(x)\!:=\vartheta ({\mathcal {A}}x)\) for \(x\in {\mathbb {X}}\) where \({\mathcal {A}}\!:{\mathbb {X}}\rightarrow {\mathbb {Z}}\) is a linear mapping, and \(\vartheta \!:{\mathbb {Z}}\rightarrow ]-\infty ,\infty ]\) is a proper closed strongly convex function with modulus \(\mu \). Here, \({\mathbb {X}}\) and \({\mathbb {Z}}\) are two finite dimensional vector spaces equipped with the inner product \(\langle \cdot ,\cdot \rangle \) and its induced norm \(\Vert \cdot \Vert \). If \(\mathrm{ri}(\mathrm{dom}\vartheta )\cap \mathrm{range}{\mathcal {A}}\ne \emptyset \), then there exists a constant \({\overline{c}}>0\) such that

$$\begin{aligned} \mathrm{dist}(0, \partial g(x))\ge \frac{\sqrt{2\mu }}{{\overline{c}}} \sqrt{g(x)-g^*}\quad \ \forall x\in {\mathbb {X}} \end{aligned}$$
(22)

where \(g^*\) denotes the minimum value of the function g.

Proof

Pick an arbitrary \(x^*\in \mathrm{crit}g\) (if \(\mathrm{crit}g=\emptyset \), the conclusion holds automatically). We first prove that \(\mathrm{crit}g =\{x\in {\mathbb {X}}\!:\,{\mathcal {A}}x={\mathcal {A}}x^*\}\). To this end, pick any \(x'\in {\mathbb {X}}\) with \({\mathcal {A}}x'={\mathcal {A}}x^*\). Since \(\mathrm{ri}(\mathrm{dom}\vartheta )\cap \mathrm{range}{\mathcal {A}}\ne \emptyset \), by [30, Theorem 23.9], we have \( \partial g(x')={\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x') = {\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x^*) = \partial g(x^*). \) Notice that \(0\in \partial g(x^*) \), we obtain \(0\in \partial g(x')\), which implies that \(x'\in \mathrm{crit}g\). This means that \(\{x\in {\mathbb {X}}\!:\,{\mathcal {A}}x={\mathcal {A}}x^*\}\subseteq \mathrm{crit}g\). Suppose that there exist \({\overline{x}}\in \mathrm{crit} g\) such that \({\mathcal {A}}{\overline{x}}\ne {\mathcal {A}}x^*\). Then, by the strong convexity of \(\vartheta \), we have

$$\begin{aligned} g(({\overline{x}} + x^*)/2) = \vartheta (({\mathcal {A}}{\overline{x}}+{\mathcal {A}}x^*)/2) <(g({\overline{x}})+ g(x^*))/2. \end{aligned}$$

This contradicts the fact that \(x', x^* \in \mathrm{crit}g\). Thus, the equality \(\mathrm{crit} g= \{x \in {\mathbb {X}}\!:\,{\mathcal {A}}x = {\mathcal {A}}x^*\}\) holds. By Hoffman inequality [35], there exists a constant \({\overline{c}}>0\) such that for any \(z \in {\mathbb {X}}\),

$$\begin{aligned} \Vert \varPi _{\mathrm{crit} g} (z) -z\Vert \le {\overline{c}} \Vert {\mathcal {A}}(\varPi _{\mathrm{crit} g} (z) - z)\Vert , \end{aligned}$$
(23)

where \(\varPi _{\mathrm{crit} g}\) is the projection mapping onto \(\mathrm{crit} g\). Fix an arbitrary \(x\in {\mathbb {X}}\). If \(x\notin \mathrm{dom}\partial g\), the inequality (22) holds trivially. So, it suffices to consider the case \(x\in \mathrm{dom}\partial g\). By [30, Theorem 23.9], \(\partial g(x)={\mathcal {A}}^* \partial \vartheta ({\mathcal {A}}x)\). Obviously, \(\partial \vartheta ({\mathcal {A}}x) \ne \emptyset \). Pick any \(\xi \in \partial \vartheta ({\mathcal {A}}x)\). By the strong convexity of \(\vartheta \) and [36, Theorem 6.1.2], it follows that

$$\begin{aligned} g(z) \ge g(x)+\langle \xi ,{\mathcal {A}}(z-x)\rangle + \frac{\mu }{2}\Vert {\mathcal {A}}(z-x)\Vert ^2 \quad \ \forall z\in {\mathbb {X}}. \end{aligned}$$

By taking \(z = \varPi _{\mathrm{crit} g}(x)\), from the last inequality we obtain that

$$\begin{aligned} \begin{aligned} g(\varPi _{\mathrm{crit} g}(x))&\ge g(x) + \langle \xi , {\mathcal {A}}(\varPi _{\mathrm{crit} g}(x)-x) \rangle + \frac{\mu }{2} \Vert {\mathcal {A}}(\varPi _{\mathrm{crit} g}(x)-x)\Vert ^2 \\&\ge g(x) + \langle \xi , {\mathcal {A}}(\varPi _{\mathrm{crit} g}(x)-x)\rangle + \frac{\mu }{2{\overline{c}}^2} \Vert \varPi _{\mathrm{crit} g}(x)-x\Vert ^2 \\&\ge g(x) + \min _{y\in {\mathbb {X}}}\left[ \langle \xi , {\mathcal {A}}(y-x)\rangle + \frac{\mu }{2{\overline{c}}^2} \Vert y-x\Vert ^2 \right] \\&\ge g(x) - 0.5 ({\overline{c}}^2/\mu ) \Vert {\mathcal {A}}^*\xi \Vert ^2, \end{aligned} \end{aligned}$$

where the second inequality follows from (23). Note that \(g(\varPi _{\mathrm{crit} g}(x))=g(x^*)\). The last inequality implies that \(\Vert {\mathcal {A}}^*\xi \Vert ^2\ge (2\mu /{\overline{c}}^2)[g(x)-g(x^*)]\). Together with \(\partial g(x)={\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x)\),

$$\begin{aligned} \mathrm{dist}(0, \partial g(x))^2\ge \min _{\xi \in \partial \vartheta ({\mathcal {A}}x)}\Vert {\mathcal {A}}^*\xi \Vert ^2 \ge (2\mu /{\overline{c}}^2)[g(x)-g(x^*)]. \end{aligned}$$

This implies that the desired inequality (22) holds. \(\square \)

The proof of Proposition 3.2: First, we assume that \(\psi \) is a proper closed piecewise linear–quadratic convex function. From Lemma 3.2 and 3.3, we observe that the multifunction \(\partial h\) is piecewise, i.e., its graph is the union of finitely many polyhedral sets. So, \(\partial h\) is locally upper Lipschitzian at each point \(x\in {\mathbb {R}}^p\) by [33, Proposition 1], which implies that \(\partial h\) is metrically subregular at each point of its graph. In addition, Sun [37] showed that a proper closed convex function \(\psi \) is piecewise linear–quadratic if and only if \(\partial \psi \) is piecewise polyhedral. Thus, by combining [33, Proposition 1] and [38, Section 3.2], we obtain the conclusion.

Now assume \(\psi =\delta _C\). Fix an arbitrary \({\overline{x}}\in C\). Write \(J=\mathrm{supp}({\overline{x}})\) and \({\overline{J}}=\{1,\ldots ,p\}\backslash J\). Define the subspace \(L\!:=\{x\in {\mathbb {R}}^p:\ x_i=0\ \mathrm{for}\ i\in {\overline{J}}\}\). By Lemma 3.2, \(\partial h({\overline{x}})={\mathcal {N}}_{L}({\overline{x}})\). Take an arbitrary \(v\in \widehat{\partial }(\delta _C+h)({\overline{x}})\). From Definition 3.1, it follows that

$$\begin{aligned} 0&\le \liminf _{x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}}\frac{h(x')+\delta _C(x') -h({\overline{x}})-\delta _C({\overline{x}})-\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert }\\&\le \liminf _{x'\in C,\mathrm{supp}(x')=J,x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}} \frac{h(x')-h({\overline{x}})-\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert }\\&=\liminf _{x'\in C,\mathrm{supp}(x')=J,x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}} \frac{-\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert }\\&=\liminf _{x'\in C\cap L,x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}} \frac{\delta _{C\cap L}(x')-\delta _{C\cap L}({\overline{x}}) -\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert } \end{aligned}$$

which implies that \(v\in \widehat{\partial }\delta _{C\cap L}({\overline{x}})\). Consequently, \(\widehat{\partial }(\delta _C+h)({\overline{x}}) \subseteq \widehat{\partial }\delta _{C\cap L}({\overline{x}})\). Together with [27, Corollary 10.9], Lemma 3.2, \(\partial h({\overline{x}})={\mathcal {N}}_{L}({\overline{x}})\) and the convexity of C, we have

$$\begin{aligned} \begin{aligned} \partial \delta _C({\overline{x}})+\partial h({\overline{x}})&= \widehat{\partial }\delta _C({\overline{x}})+\widehat{\partial }h({\overline{x}}) \subseteq \widehat{\partial }(\delta _C+h)({\overline{x}})\subseteq \partial (\delta _C+\delta _L)({\overline{x}})\\&=\partial \delta _C({\overline{x}})+\partial \delta _L({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial h({\overline{x}}). \end{aligned} \end{aligned}$$
(24)

By the arbitrariness of \({\overline{x}}\), this implies that \(\partial \delta _C(x)+\partial h(x)=\widehat{\partial }(\delta _C+h)(x)\) for any \(x\in C\). Next we argue that \(\partial (\delta _C+h)({\overline{x}})\subseteq \partial \delta _C({\overline{x}})+\partial h({\overline{x}})\). Take an arbitrary \(v\in \partial (\delta _C+h)({\overline{x}})\). There exist \(x^k\xrightarrow [\delta _{C}+h]{}{\overline{x}}\) and \(v^k\in \widehat{\partial }(\delta _{C}\!+\!h)(x^k)\) with \(v^k\rightarrow v\) as \(k\rightarrow \infty \). From the previous arguments, \(v^k\in \partial \delta _{C}(x^k)+\partial h(x^k)\) for each k. Since \(x^k\rightarrow {\overline{x}}\), we have \(x^k\ne 0\) and \(\mathrm{supp}(x^k)\supseteq J\) for all sufficiently large k. Since \(\delta _{C}(x^k)+h(x^k)\rightarrow \delta _{C}({\overline{x}})+h({\overline{x}})\), we must have \(x^k\in C\) and \(h(x^k)\rightarrow h({\overline{x}})\) for all sufficiently large k. The latter, along with \(\mathrm{supp}(x^k)\supseteq J\), implies that \(\mathrm{supp}(x^k)=J\) for all sufficiently large k. So, \(\partial h(x^k)=\partial \delta _{L}(x^k)\) for large enough k. Combing with (24) and \(v^k\in \partial \delta _{C}(x^k)+\partial h(x^k)\), we have \(v^k\in \partial \delta _{C\cap L}(x^k)\). Then, \(v\in \partial (\delta _C+\delta _{L})({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial \delta _{L}({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial h({\overline{x}})\). The stated inclusion holds. The previous arguments imply that \( \widehat{\partial }(\delta _C+h)({\overline{x}}) =\partial (\delta _C+h)({\overline{x}}) ={\mathcal {N}}_{C}({\overline{x}})+\partial h({\overline{x}}) =\partial \delta _{C\cap L}({\overline{x}}). \) Suppose that \(\partial \delta _{C\cap L}({\overline{x}})\ne \emptyset \) (if not, the last equation implies the result). Then, we have

$$\begin{aligned} \partial (\delta _{C}+h)({\overline{x}}) =\partial \delta _{C\cap L}({\overline{x}}) =\partial ^{\infty }\delta _{C\cap L}({\overline{x}}) =[\partial \delta _{C\cap L}({\overline{x}})]^{\infty } =[\widehat{\partial }(\delta _{C}\!+h)({\overline{x}})]^{\infty } \end{aligned}$$

where the second equality is by [27, Exercise 8.14] and the third one is due to [27, Proposition 8.12]. Thus, the first part of the desired results follows. Using the same arguments as above, we can obtain the second part. The proof is completed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Pan, S. & Bi, S. Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions. J Optim Theory Appl 188, 94–112 (2021). https://doi.org/10.1007/s10957-020-01779-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01779-7

Keywords

Mathematics Subject Classification

Navigation