Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions

Wu, Yuqia; Pan, Shaohua; Bi, Shujun

doi:10.1007/s10957-020-01779-7

Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions

Published: 23 November 2020

Volume 188, pages 94–112, (2021)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Yuqia Wu¹,
Shaohua Pan¹ &
Shujun Bi¹

986 Accesses
5 Citations
Explore all metrics

A Correction to this article was published on 06 May 2021

This article has been updated

Abstract

This paper focuses on a class of zero-norm composite optimization problems. For this class of nonconvex nonsmooth problems, we establish the Kurdyka–Łojasiewicz property of exponent being a half for its objective function under a suitable assumption and provide some examples to illustrate that such an assumption is not very restricted which, in particular, involve the zero-norm regularized or constrained piecewise linear–quadratic function, the zero-norm regularized or constrained logistic regression function, the zero-norm regularized or constrained quadratic function over a sphere.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Riemannian optimization on unit sphere with p-norm and its applications

Article 02 May 2023

Coefficient multipliers on mixed-norm spaces $$H(p,q,\varphi )$$

Article 06 March 2017

A Fast Algorithm to Estimate the Square Root of Probability Density Function

Change history

06 May 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10957-021-01855-6

References

Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizeations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Article Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Article MathSciNet Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and reguarlized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Article MathSciNet Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Article MathSciNet Google Scholar
Pan, S.H., Liu, Y.L.: Metric subregularity of subdifferential and KL property of exponent 1/2. arXiv:1812.00558v3(2019)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)
Article MathSciNet Google Scholar
Wang, X.F., Ye, J.J., Yuan, X.M., Zeng, S.Z., Zhang, J.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis. arXiv:1810.10051(2018)
Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferential. J. Convex Anal. 15, 365–380 (2008)
MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Article MathSciNet Google Scholar
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 1, 43–54 (1992)
Article MathSciNet Google Scholar
Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
Article MathSciNet Google Scholar
Zhou, Z.R., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)
Article MathSciNet Google Scholar
Cui, Y., Sun, D.F., Toh, K.C.: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1300-6
Article MATH Google Scholar
D’Acunto, D., Kurdyka, D.: Explicit bounds for the Lojasiewicz exponent in the gradient inequality for polynomials. Ann. Polon. Math. 87, 51–61 (2005)
Article MathSciNet Google Scholar
Li, G.Y., Mordukhovich, B.S., Phạm, T.S.: New fractional error bounds for polynomial systems with application to Holderian stability in optimization and spectral theory of tensors. Math. Program. 153(2), 333–362 (2015). (Ser. A)
Article MathSciNet Google Scholar
Li, G.Y., Mordukhovich, B.S., Nghia, T.T.A., Phạm, T.S.: Error bounds for parametric polynomial systems with applications to higher-order stability analysis and convergence rates. Math. Program. 168(1–2), 313–346 (2018)
Article MathSciNet Google Scholar
Li, G.Y., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)
Article MathSciNet Google Scholar
Yu, P.R., Li, G.Y., Pong, T.K.: Deducing Kurdyka–Łojasiewicz exponent via inf-projection. arXiv:1902.03635 (2019)
Liu, H.K., So, A.M.-C., Wu, W.J.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1285-1
Article MATH Google Scholar
Zhang, Q., Chen, C.H., Liu, H.K., So, A.M.-C., Zhou, Z.R.: On the linear convergence of the ADMM for regularized non-convex low-rank matrix recovery. https://www1.se.cuhk.edu.hk/~manchoso/admm_MF.pdf
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)
Article MathSciNet Google Scholar
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
MathSciNet MATH Google Scholar
Yuan, X.T., Zhang, T.: Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14, 899–925 (2013)
MathSciNet MATH Google Scholar
Asteris, M., Papailiopoulos, D., Dimakis, A.: Nonnegative sparse PCA with provable guarantees. In: International Conference on Machine Learning (2014)
Brodie, J., Daubechies, I., De Mol, C., Giannone, D., Loris, I.: Sparse and stable Markowitz portfolios. Proc. Natl. Acad. Sci. 106, 12267–12272 (2009)
Article Google Scholar
Zhang, J.Y., Liu, H.Y., Wen, Z.W., Zhang, S.Z.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40, A3091–A3120 (2017)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.: Variational analysis. Springer, New York (1998)
Book Google Scholar
Mordukhovich, B.S.: Variational Analysis and Applications. Springer, Cham (2018)
Book Google Scholar
Le, Y.H.: Generalized subdifferentials of the rank function. Optim. Lett. 7, 731–743 (2013)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book Google Scholar
Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.F.: Restricted normal cones and sparsity optimization with affine constraints. Found. Comput. Math. 14, 63–83 (2014)
Article MathSciNet Google Scholar
Feng, X., Wu, C.L.: Every critical point of an $l_0$ regularized minimization model is a local minimizer. arXiv:1912.04498 (2019)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–214 (1981)
Article MathSciNet Google Scholar
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer International Publishing (2016)
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand 49, 263–265 (1952)
Article MathSciNet Google Scholar
Lemarechal, C.: Conver analysis and minimization algorithm I. Springer, New York (1991)
Google Scholar
Sun, J.: On Monotropic Piecewise Quadratic Programming. Ph.D Thesis, Department of Mathematics, University of Washington, Seattle(1986)
Ioffe, A.D., Outrata, J.V.: On metric and calmness qualification conditions in subdifferential calculus. Set-valued Anal. 16, 199–227 (2008)
Article MathSciNet Google Scholar
Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to give their sincere thanks to two anonymous reviewers for their helpful comments, which improve greatly the original manuscript. The research of S. H. Pan and S. J. Bi is supported by the National Natural Science Foundation of China under Project No. 11971177 and No. 11701186, and Guangdong Basic and Applied Basic Research Foundation (2020A1515010408).

Author information

Authors and Affiliations

School of Mathematics, South China University of Technology, Guangzhou, China
Yuqia Wu, Shaohua Pan & Shujun Bi

Authors

Yuqia Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shaohua Pan
View author publications
You can also search for this author in PubMed Google Scholar
Shujun Bi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shujun Bi.

Additional information

Communicated by Boris S. Mordukhovich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 A: KL Property Relative to a Manifold

Let ${\mathcal {M}}\subset {\mathbb {R}}^p$ be a ${\mathcal {C}}^2$-smooth manifold and $f\!:{\mathcal {M}}\rightarrow {\mathbb {R}}$ be a ${\mathcal {C}}^2$-smooth function. The set of critical points of the problem $ \min _{x\in {\mathcal {M}}}f(x) $ is ${\mathcal {X}}:=\big \{x\in {\mathcal {M}}:\ \nabla _{\!{\mathcal {M}}}f(x)=0\big \}$, where $\nabla _{\!{\mathcal {M}}}f(z)$ is the projection of $\nabla \!f(z)$ onto the tangent space ${\mathcal {T}}_{{\mathcal {M}}}(z)$ of ${\mathcal {M}}$ at z. We say that f is a KL function of exponent 1/2 relative to ${\mathcal {M}}$ if f has the KL property of exponent 1/2 at each ${\overline{x}}\in {\mathcal {X}}$, i.e., there exist $\delta >0$ and $\gamma >0$ such that

$$\begin{aligned} \Vert \nabla _{\!{\mathcal {M}}}f(z)\Vert \ge \gamma \sqrt{|f(z)-f({\overline{x}})|} \quad \ \forall z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}. \end{aligned}$$

(13)

This part states the relation between the KL property of exponent 1/2 of f relative to ${\mathcal {M}}$ and the KL property of exponent 1/2 for its extended ${\widetilde{f}}(x):=f(x)+\delta _{{\mathcal {M}}}(x)$ for $x\in {\mathbb {R}}^p$.

Lemma A.1

Let ${\mathcal {M}}\!\subset \!{\mathbb {R}}^p$ be a ${\mathcal {C}}^2$-smooth manifold and $f\!:{\mathcal {M}}\rightarrow {\mathbb {R}}$ be a ${\mathcal {C}}^2$-smooth function. If f is a KL function of exponent 1/2 relative to ${\mathcal {M}}$, then ${\widetilde{f}}$ is a KL function of exponent 1/2. Conversely, if ${\widetilde{f}}$ is a KL function of exponent 1/2 and each critical point is a local minimizer, then f is a KL function of exponent 1/2 relative to ${\mathcal {M}}$.

Proof

Notice that $\partial \!{\widetilde{f}}(x)=\nabla \!f(x) +{\mathcal {N}}_{{\mathcal {M}}}(x)$ for any $x\in {\mathcal {M}}$. Clearly, ${\mathcal {X}}=\mathrm{crit}{\widetilde{f}}$. Fix an arbitrary ${\overline{x}}\in {\mathcal {X}}$. Since f has the KL property of exponent 1/2 relative to ${\mathcal {M}}$ at ${\overline{x}}$, there exist $\delta >0$ and $\gamma >0$ such that (13) holds for all $z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}$. Fix an arbitrary $\eta >0$ and an arbitrary $x\in {\mathbb {B}}({\overline{x}},\delta ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]$. Clearly, $x\in {\mathcal {M}}$. Moreover,

$$\begin{aligned}&\mathrm{dist}(0,\partial \!{\widetilde{f}}(x)) =\Vert \nabla f(x)-\varPi _{{\mathcal {N}}_{{\mathcal {M}}}(x)}(\nabla \!f(x))\Vert \nonumber \\&\quad =\Vert \varPi _{{\mathcal {T}}_{{\mathcal {M}}}(x)}(\nabla \!f(x))\Vert =\Vert \nabla \!f_{{\mathcal {M}}}(x)\Vert . \end{aligned}$$

(14)

Along with (13), $\mathrm{dist}(0,\partial \!{\widetilde{f}}(x))\ge \gamma \sqrt{f(x)-f({\overline{x}})}$. So, the first part of the results follows.

Next we focus on the second part. Fix an arbitrary ${\overline{x}}\in {\mathcal {X}}$. By the given assumption, clearly, ${\overline{x}}$ is a local optimal solution of $ \min _{x\in {\mathcal {M}}}f(x). $ Hence, there exists $\varepsilon '>0$ such that

$$\begin{aligned} f(z)\ge f({\overline{x}})\quad \ \forall z\in {\mathbb {B}}({\overline{x}},\varepsilon ')\cap {\mathcal {M}}. \end{aligned}$$

By the KL property of exponent 1/2 of ${\widetilde{f}}$ at ${\overline{x}}$, there exist $\varepsilon ,c>0$ and $\eta >0$ such that

$$\begin{aligned} \mathrm{dist}(0,\partial \!{\widetilde{f}}(x))\ge c\sqrt{f(x)-{\widetilde{f}}(x)} \quad \forall x\in {\mathbb {B}}({\overline{x}},\varepsilon ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]. \end{aligned}$$

(15)

Since f is ${\mathcal {C}}^2$-smooth around ${\overline{x}}$, there exists $\varepsilon ''>0$ such that for all $z\in {\mathbb {B}}({\overline{x}},\varepsilon '')\cap {\mathcal {M}}$, $ f(z)<f({\overline{x}})+\eta . $ Take $\delta =\min (\varepsilon ,\varepsilon ',\varepsilon '')$. Fix an arbitrary $x\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}$. Clearly, $ f({\overline{x}})\le f(x)\le f({\overline{x}})+\eta . $ If $f(x)>f({\overline{x}})$, then $x\in {\mathbb {B}}({\overline{x}},\varepsilon ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]$, and from (15) and (14), $ \Vert \nabla \!f_{{\mathcal {M}}}(x)\Vert \ge c\sqrt{|f(x)-f({\overline{x}})|}. $ If $f(x)=f({\overline{x}})$, this inequality holds automatically. $\square $

1.2 B: KL Property of the Quadratic Function over a Sphere

For any integer $m\ge 1$ and any given $m\times m$ real symmetric H, define $g(z):=z^{{\mathbb {T}}}Hz+\delta _{{\mathcal {S}}}(z)$ for $z\in {\mathbb {R}}^m$. Lemma 1 in Appendix A and [19, Theorem 1] imply that g is a KL function of exponent 1/2. This part gives a different proof, which needs the following lemmas.

Lemma B.1

The critical point set of g takes the form of $\mathrm{crit}g=\big \{z\in {\mathcal {S}}:\ Hz=\langle z,Hz\rangle z\big \}.$ So, by letting H have the eigenvalue decomposition $P\varLambda P^{{\mathbb {T}}}$ with $\varLambda =\mathrm{diag}(\lambda _1,\ldots ,\lambda _m)$ for $\lambda _1\ge \cdots \ge \lambda _m$ and $P\in {\mathbb {O}}^m$, $\mathrm{crit}g =PW$ with $ W=\big \{y\in {\mathcal {S}}:\ \varLambda y=\langle y,\varLambda y\rangle y\big \}.$

Proof

By [27, Exercise 8.8] and Lemma 3.1, it immediately follows that for any $z\in {\mathbb {R}}^m$,

$$\begin{aligned} \partial g(z)=2Hz+\partial \delta _{{\mathcal {S}}}(z)=2Hz+[\![z]\!]. \end{aligned}$$

(16)

Choose an arbitrary ${\overline{z}}\in \mathrm{crit}g$. From (16), there exists ${\overline{t}}\in {\mathbb {R}}$ such that $0=2H{\overline{z}}+{\overline{t}}{\overline{z}}$. Along with $\Vert {\overline{z}}\Vert =1$, we have ${\overline{t}}=-2\langle {\overline{z}},H{\overline{z}}\rangle $, and hence ${\overline{z}}\in \!\big \{z\in {\mathcal {S}}:\ Hz=\!\langle z,Hz\rangle z\big \}$. Consequently, $ \mathrm{crit}g\subseteq \big \{z\in {\mathcal {S}}:\ Hz=\langle z,Hz\rangle z\big \}. $ The converse inclusion is immediate to check by Lemma 3.1. Thus, the first part follows. The second part is immediate. $\square $

Lemma B.2

Let $D=\mathrm{diag}(d_1,d_2,\ldots ,d_p)$ with $d_1\ge d_2\ge \cdots \ge d_p$. Define the function $\psi (x):=x^{{\mathbb {T}}}Dx+\delta _S(x)$ for $x\in {\mathbb {R}}^p$. Then, $\psi $ is a KL function of exponent 1/2.

Proof

By Lemma B.1, it is immediate to obtain the following characterization for $\mathrm{crit}\psi $:

$$\begin{aligned} \mathrm{crit}\,\psi =\big \{x\in {\mathcal {S}}:\ Dx=\langle x,Dx\rangle x\big \}. \end{aligned}$$

(17)

Clearly, for each $x\in \mathrm{crit}\,\psi $, $d_i=\langle x,Dx\rangle $ with $i\in \mathrm{supp}(x)$. For any $z\in \mathrm{dom}\,\partial \psi $, we have

$$\begin{aligned} \begin{aligned} \text{ dist }(0,\partial \psi (z))^2&= \min _{u\in \partial \psi (z)} \Vert u\Vert ^2= \min _{w\in {\mathbb {R}}} \Vert 2Dz + wz\Vert ^2\\&=\min _{w\in {\mathbb {R}}}\Big \{4\langle z,D^{{\mathbb {T}}}Dz\rangle +w^2+4w\langle z,Dz\rangle \Big \}\\&= 4\langle z,D^{{\mathbb {T}}}Dz\rangle -4(\langle z,Dz\rangle )^2 =4\Vert Dz-\langle z,Dz\rangle z\Vert ^2. \end{aligned} \end{aligned}$$

(18)

Now fix an arbitrary ${\overline{x}}\in \mathrm{crit}\,\psi $. From (17), it immediately follows that $ -D{\overline{x}}+\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}}=0. $ We next proceed the arguments by two cases as will be shown below.

Case 1: $d_1=\cdots =d_p=\gamma $ for some $\gamma \in {\mathbb {R}}$. Choose an arbitrary $\eta >0$ and an arbitrary $\delta >0$. Fix an arbitrary $x \in {\mathbb {B}}({\overline{x}},\delta )\cap [\psi ({\overline{x}})<\psi (x) < \psi ({\overline{x}})+\eta ]$. Clearly, $x\in {\mathcal {S}}$ and $\langle x,Dx\rangle =\gamma $. Combining $\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}}=D{\overline{x}}$ and Eq. (18) yields that

$$\begin{aligned} \text{ dist }(0, \partial \psi (x)) =4\Vert Dx-\langle x,Dx\rangle x-(D{\overline{x}}-\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}})\Vert =0. \end{aligned}$$

In addition, $\psi (x)=\psi ({\overline{x}})=\gamma $. This means that $ \text{ dist }(0, \partial \psi (x))=\sqrt{\psi (x)-\psi ({\overline{x}})}. $

Case 2: there exist $i\ne j\in \{1,2,\ldots ,p\}$ such that $d_i\ne d_j$. Write $J=\mathrm{supp}({\overline{x}})$ and ${\overline{J}}=\{1,\ldots ,p\}\backslash J$. By (17), we know that $d_i=\langle {\overline{x}},D{\overline{x}}\rangle $ for all $i\in J$. This means that there must exist an index $\kappa \in {\overline{J}}$ such that $d_{\kappa }\ne \langle {\overline{x}},D{\overline{x}}\rangle $. Write $ {\overline{J}}_1:=\big \{i\in {\overline{J}}:\ d_i\ne \langle {\overline{x}},D{\overline{x}}\rangle \big \}. $ By the continuity of the function $\langle \cdot ,D\cdot \rangle $, there exists $\delta >0$ such that for all $z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {S}}$,

$$\begin{aligned} \frac{1}{2}|d_j-\langle {\overline{x}},D{\overline{x}} \rangle |\le |d_j-\langle z,Dz\rangle |\le \frac{3}{2}|d_j-\langle {\overline{x}},D{\overline{x}}\rangle |\quad \forall j\in {\overline{J}}_1. \end{aligned}$$

(19)

Choose an arbitrary $\eta >0$. Fix an arbitrary $x\in {\mathbb {B}}({\overline{x}},\delta ) \cap [\psi ({\overline{x}})<\psi (x)<\psi ({\overline{x}})+\eta ]$. Clearly, $x \in {\mathcal {S}}$. From Eq. (18), it follows that

$$\begin{aligned} \frac{1}{4}\text{ dist }(0,\partial \psi (x))^2&=\sum _{j\in {\overline{J}}}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2 +\sum _{j\in J}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2\nonumber \\&=\sum _{j\in {\overline{J}}}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2 +\sum _{j\in J}\big (\langle {\overline{x}},D{\overline{x}}\rangle -\langle x,Dx\rangle \big )^2x_j^2\nonumber \\&\ge \sum _{j\in {\overline{J}}_1}\big (d_j-\langle x,Dx\rangle \big )^2x_j^2 \ge \frac{1}{4}\sum _{j\in {\overline{J}}_1}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )^2x_j^2 \end{aligned}$$

(20)

where the third equality is due to (17), the first inequality is by the definition of ${\overline{J}}_1$, and the last inequality is due to (19). On the other hand, by the definition of $\psi $,

$$\begin{aligned} \psi (x)-\psi ({\overline{x}})&=\langle x,Dx\rangle -\langle {\overline{x}},D{\overline{x}}\rangle =\sum _{j\in {\overline{J}}}d_jx_j^2+\sum _{j\in J}d_jx_j^2-\langle {\overline{x}},D{\overline{x}}\rangle \Vert x\Vert ^2\nonumber \\&=\sum _{j\in {\overline{J}}}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2 +\sum _{j\in J}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2\nonumber \\&=\sum _{j\in {\overline{J}}}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2 =\sum _{j\in {\overline{J}}_1}\big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )x_j^2\nonumber \\&\le \sum _{j\in {\overline{J}}_1}|\langle {\overline{x}},D{\overline{x}}\rangle -d_j| x_j^2 \le \max _{j\in {\overline{J}}_1}|d_j-\langle {\overline{x}},D{\overline{x}}\rangle |\Vert x_{{\overline{J}}_1}\Vert ^2 \end{aligned}$$

(21)

where the fourth equality is due to (17), the fifth one is by the definition of ${\overline{J}}_1$, and the inequality is since $\psi (x)-\psi ({\overline{x}})>0$. From the above inequalities (20) and (21),

$$\begin{aligned} \text{ dist }(0, \partial \psi (x))&\ge \sqrt{\sum _{j\in {\overline{J}}_1} \big (d_j-\langle {\overline{x}},D{\overline{x}}\rangle \big )^2x_j^2} \ge \min _{j\in {\overline{J}}_1}|d_j-\langle {\overline{x}}, D{\overline{x}}\rangle |\Vert x_{{\overline{J}}_1}\Vert \\&\ge \frac{\min _{j\in {\overline{J}}_1}|d_j- \langle {\overline{x}},D{\overline{x}}\rangle |}{\sqrt{\max _{j\in {\overline{J}}_1}|d_j- \langle {\overline{x}},D{\overline{x}}\rangle |}} \sqrt{\psi (x)-\psi ({\overline{x}})}. \end{aligned}$$

By the arbitrariness of x, Cases 1 and 2 show that $\psi $ has the KL property with exponent 1/2 at ${\overline{x}}$. From the arbitrariness of ${\overline{x}}$ in $\mathrm{crit}\psi $, $\psi $ is a KL function of exponent 1/2. $\square $

Now we prove that g is a KL function of exponent 1/2. Fix an arbitrary ${\overline{z}}\in \mathrm{crit}g$. Let H have the eigenvalue decomposition as in Lemma B.1. Then, ${\overline{y}}=P^{{\mathbb {T}}}{\overline{z}}\in \mathrm{crit}\psi $ where $\psi $ is defined in Lemma B.2 with $D=\varLambda $. By Lemma B.2, there exist $\eta>0,\delta >0$ and $c>0$ such that

$$\begin{aligned} \mathrm{dist}(0,\partial \psi (y))\ge c\sqrt{\psi (y)-\psi ({\overline{y}})} \quad \ \forall y\in {\mathbb {B}}({\overline{y}},\delta ) \cap [\psi ({\overline{y}})<\psi <\psi ({\overline{y}})+\eta ]. \end{aligned}$$

Fix an arbitrary $z\in {\mathbb {B}}({\overline{z}},\delta )\cap [g({\overline{z}})<g<g({\overline{z}})+\eta ]$. Clearly, $z\in {\mathcal {S}}$. Write $y=P^{{\mathbb {T}}}z$. Then, $y\in {\mathcal {S}}$ and $g(z)=\psi (y)$. Since $g({\overline{z}})=g({\overline{y}})$, $ y\in {\mathbb {B}}({\overline{y}},\delta )\cap [\psi ({\overline{y}})<\psi (y) < \psi ({\overline{y}})+\eta ]. $ In addition, from (16) and the eigenvalue decomposition of H, $\partial g(z)=P\partial \psi (y)$. Thus,

$$\begin{aligned} \mathrm{dist}(0,\partial g(z))=\mathrm{dist}(0,P\partial \psi (y)) =\mathrm{dist}(0,\partial \psi (y))\ge c\sqrt{\psi (y)-\psi ({\overline{y}})}. \end{aligned}$$

Together with $\psi (y)-\psi ({\overline{y}})=g(z)-g({\overline{z}})$, it follows that g has the KL property with exponent of 1/2 at ${\overline{z}}$. By the arbitrariness of ${\overline{z}}$ in $\mathrm{crit}g$, g is a KL function of exponent 1/2.

1.3 C: Supplementary Lemma and Proofs

The following lemma extends the result of [34, Section 2.3] for the differentiable strongly convex function to the setting of closed proper strongly convex functions. In particular, it implies that the composite g is a KL function of exponent 1/2 without surjectivity of ${\mathcal {A}}$.

Lemma C.1

Consider $g(x)\!:=\vartheta ({\mathcal {A}}x)$ for $x\in {\mathbb {X}}$ where ${\mathcal {A}}\!:{\mathbb {X}}\rightarrow {\mathbb {Z}}$ is a linear mapping, and $\vartheta \!:{\mathbb {Z}}\rightarrow ]-\infty ,\infty ]$ is a proper closed strongly convex function with modulus $\mu $. Here, ${\mathbb {X}}$ and ${\mathbb {Z}}$ are two finite dimensional vector spaces equipped with the inner product $\langle \cdot ,\cdot \rangle $ and its induced norm $\Vert \cdot \Vert $. If $\mathrm{ri}(\mathrm{dom}\vartheta )\cap \mathrm{range}{\mathcal {A}}\ne \emptyset $, then there exists a constant ${\overline{c}}>0$ such that

$$\begin{aligned} \mathrm{dist}(0, \partial g(x))\ge \frac{\sqrt{2\mu }}{{\overline{c}}} \sqrt{g(x)-g^*}\quad \ \forall x\in {\mathbb {X}} \end{aligned}$$

(22)

where $g^*$ denotes the minimum value of the function g.

Proof

Pick an arbitrary $x^*\in \mathrm{crit}g$ (if $\mathrm{crit}g=\emptyset $, the conclusion holds automatically). We first prove that $\mathrm{crit}g =\{x\in {\mathbb {X}}\!:\,{\mathcal {A}}x={\mathcal {A}}x^*\}$. To this end, pick any $x'\in {\mathbb {X}}$ with ${\mathcal {A}}x'={\mathcal {A}}x^*$. Since $\mathrm{ri}(\mathrm{dom}\vartheta )\cap \mathrm{range}{\mathcal {A}}\ne \emptyset $, by [30, Theorem 23.9], we have $ \partial g(x')={\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x') = {\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x^*) = \partial g(x^*). $ Notice that $0\in \partial g(x^*) $, we obtain $0\in \partial g(x')$, which implies that $x'\in \mathrm{crit}g$. This means that $\{x\in {\mathbb {X}}\!:\,{\mathcal {A}}x={\mathcal {A}}x^*\}\subseteq \mathrm{crit}g$. Suppose that there exist ${\overline{x}}\in \mathrm{crit} g$ such that ${\mathcal {A}}{\overline{x}}\ne {\mathcal {A}}x^*$. Then, by the strong convexity of $\vartheta $, we have

$$\begin{aligned} g(({\overline{x}} + x^*)/2) = \vartheta (({\mathcal {A}}{\overline{x}}+{\mathcal {A}}x^*)/2) <(g({\overline{x}})+ g(x^*))/2. \end{aligned}$$

This contradicts the fact that $x', x^* \in \mathrm{crit}g$. Thus, the equality $\mathrm{crit} g= \{x \in {\mathbb {X}}\!:\,{\mathcal {A}}x = {\mathcal {A}}x^*\}$ holds. By Hoffman inequality [35], there exists a constant ${\overline{c}}>0$ such that for any $z \in {\mathbb {X}}$,

$$\begin{aligned} \Vert \varPi _{\mathrm{crit} g} (z) -z\Vert \le {\overline{c}} \Vert {\mathcal {A}}(\varPi _{\mathrm{crit} g} (z) - z)\Vert , \end{aligned}$$

(23)

where $\varPi _{\mathrm{crit} g}$ is the projection mapping onto $\mathrm{crit} g$. Fix an arbitrary $x\in {\mathbb {X}}$. If $x\notin \mathrm{dom}\partial g$, the inequality (22) holds trivially. So, it suffices to consider the case $x\in \mathrm{dom}\partial g$. By [30, Theorem 23.9], $\partial g(x)={\mathcal {A}}^* \partial \vartheta ({\mathcal {A}}x)$. Obviously, $\partial \vartheta ({\mathcal {A}}x) \ne \emptyset $. Pick any $\xi \in \partial \vartheta ({\mathcal {A}}x)$. By the strong convexity of $\vartheta $ and [36, Theorem 6.1.2], it follows that

$$\begin{aligned} g(z) \ge g(x)+\langle \xi ,{\mathcal {A}}(z-x)\rangle + \frac{\mu }{2}\Vert {\mathcal {A}}(z-x)\Vert ^2 \quad \ \forall z\in {\mathbb {X}}. \end{aligned}$$

By taking $z = \varPi _{\mathrm{crit} g}(x)$, from the last inequality we obtain that

$$\begin{aligned} \begin{aligned} g(\varPi _{\mathrm{crit} g}(x))&\ge g(x) + \langle \xi , {\mathcal {A}}(\varPi _{\mathrm{crit} g}(x)-x) \rangle + \frac{\mu }{2} \Vert {\mathcal {A}}(\varPi _{\mathrm{crit} g}(x)-x)\Vert ^2 \\&\ge g(x) + \langle \xi , {\mathcal {A}}(\varPi _{\mathrm{crit} g}(x)-x)\rangle + \frac{\mu }{2{\overline{c}}^2} \Vert \varPi _{\mathrm{crit} g}(x)-x\Vert ^2 \\&\ge g(x) + \min _{y\in {\mathbb {X}}}\left[ \langle \xi , {\mathcal {A}}(y-x)\rangle + \frac{\mu }{2{\overline{c}}^2} \Vert y-x\Vert ^2 \right] \\&\ge g(x) - 0.5 ({\overline{c}}^2/\mu ) \Vert {\mathcal {A}}^*\xi \Vert ^2, \end{aligned} \end{aligned}$$

where the second inequality follows from (23). Note that $g(\varPi _{\mathrm{crit} g}(x))=g(x^*)$. The last inequality implies that $\Vert {\mathcal {A}}^*\xi \Vert ^2\ge (2\mu /{\overline{c}}^2)[g(x)-g(x^*)]$. Together with $\partial g(x)={\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x)$,

$$\begin{aligned} \mathrm{dist}(0, \partial g(x))^2\ge \min _{\xi \in \partial \vartheta ({\mathcal {A}}x)}\Vert {\mathcal {A}}^*\xi \Vert ^2 \ge (2\mu /{\overline{c}}^2)[g(x)-g(x^*)]. \end{aligned}$$

This implies that the desired inequality (22) holds. $\square $

The proof of Proposition 3.2: First, we assume that $\psi $ is a proper closed piecewise linear–quadratic convex function. From Lemma 3.2 and 3.3, we observe that the multifunction $\partial h$ is piecewise, i.e., its graph is the union of finitely many polyhedral sets. So, $\partial h$ is locally upper Lipschitzian at each point $x\in {\mathbb {R}}^p$ by [33, Proposition 1], which implies that $\partial h$ is metrically subregular at each point of its graph. In addition, Sun [37] showed that a proper closed convex function $\psi $ is piecewise linear–quadratic if and only if $\partial \psi $ is piecewise polyhedral. Thus, by combining [33, Proposition 1] and [38, Section 3.2], we obtain the conclusion.

Now assume $\psi =\delta _C$. Fix an arbitrary ${\overline{x}}\in C$. Write $J=\mathrm{supp}({\overline{x}})$ and ${\overline{J}}=\{1,\ldots ,p\}\backslash J$. Define the subspace $L\!:=\{x\in {\mathbb {R}}^p:\ x_i=0\ \mathrm{for}\ i\in {\overline{J}}\}$. By Lemma 3.2, $\partial h({\overline{x}})={\mathcal {N}}_{L}({\overline{x}})$. Take an arbitrary $v\in \widehat{\partial }(\delta _C+h)({\overline{x}})$. From Definition 3.1, it follows that

$$\begin{aligned} 0&\le \liminf _{x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}}\frac{h(x')+\delta _C(x') -h({\overline{x}})-\delta _C({\overline{x}})-\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert }\\&\le \liminf _{x'\in C,\mathrm{supp}(x')=J,x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}} \frac{h(x')-h({\overline{x}})-\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert }\\&=\liminf _{x'\in C,\mathrm{supp}(x')=J,x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}} \frac{-\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert }\\&=\liminf _{x'\in C\cap L,x'\rightarrow {\overline{x}}, x'\ne {\overline{x}}} \frac{\delta _{C\cap L}(x')-\delta _{C\cap L}({\overline{x}}) -\langle v,x'-{\overline{x}}\rangle }{\Vert x'-{\overline{x}}\Vert } \end{aligned}$$

which implies that $v\in \widehat{\partial }\delta _{C\cap L}({\overline{x}})$. Consequently, $\widehat{\partial }(\delta _C+h)({\overline{x}}) \subseteq \widehat{\partial }\delta _{C\cap L}({\overline{x}})$. Together with [27, Corollary 10.9], Lemma 3.2, $\partial h({\overline{x}})={\mathcal {N}}_{L}({\overline{x}})$ and the convexity of C, we have

$$\begin{aligned} \begin{aligned} \partial \delta _C({\overline{x}})+\partial h({\overline{x}})&= \widehat{\partial }\delta _C({\overline{x}})+\widehat{\partial }h({\overline{x}}) \subseteq \widehat{\partial }(\delta _C+h)({\overline{x}})\subseteq \partial (\delta _C+\delta _L)({\overline{x}})\\&=\partial \delta _C({\overline{x}})+\partial \delta _L({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial h({\overline{x}}). \end{aligned} \end{aligned}$$

(24)

By the arbitrariness of ${\overline{x}}$, this implies that $\partial \delta _C(x)+\partial h(x)=\widehat{\partial }(\delta _C+h)(x)$ for any $x\in C$. Next we argue that $\partial (\delta _C+h)({\overline{x}})\subseteq \partial \delta _C({\overline{x}})+\partial h({\overline{x}})$. Take an arbitrary $v\in \partial (\delta _C+h)({\overline{x}})$. There exist $x^k\xrightarrow [\delta _{C}+h]{}{\overline{x}}$ and $v^k\in \widehat{\partial }(\delta _{C}\!+\!h)(x^k)$ with $v^k\rightarrow v$ as $k\rightarrow \infty $. From the previous arguments, $v^k\in \partial \delta _{C}(x^k)+\partial h(x^k)$ for each k. Since $x^k\rightarrow {\overline{x}}$, we have $x^k\ne 0$ and $\mathrm{supp}(x^k)\supseteq J$ for all sufficiently large k. Since $\delta _{C}(x^k)+h(x^k)\rightarrow \delta _{C}({\overline{x}})+h({\overline{x}})$, we must have $x^k\in C$ and $h(x^k)\rightarrow h({\overline{x}})$ for all sufficiently large k. The latter, along with $\mathrm{supp}(x^k)\supseteq J$, implies that $\mathrm{supp}(x^k)=J$ for all sufficiently large k. So, $\partial h(x^k)=\partial \delta _{L}(x^k)$ for large enough k. Combing with (24) and $v^k\in \partial \delta _{C}(x^k)+\partial h(x^k)$, we have $v^k\in \partial \delta _{C\cap L}(x^k)$. Then, $v\in \partial (\delta _C+\delta _{L})({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial \delta _{L}({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial h({\overline{x}})$. The stated inclusion holds. The previous arguments imply that $ \widehat{\partial }(\delta _C+h)({\overline{x}}) =\partial (\delta _C+h)({\overline{x}}) ={\mathcal {N}}_{C}({\overline{x}})+\partial h({\overline{x}}) =\partial \delta _{C\cap L}({\overline{x}}). $ Suppose that $\partial \delta _{C\cap L}({\overline{x}})\ne \emptyset $ (if not, the last equation implies the result). Then, we have

$$\begin{aligned} \partial (\delta _{C}+h)({\overline{x}}) =\partial \delta _{C\cap L}({\overline{x}}) =\partial ^{\infty }\delta _{C\cap L}({\overline{x}}) =[\partial \delta _{C\cap L}({\overline{x}})]^{\infty } =[\widehat{\partial }(\delta _{C}\!+h)({\overline{x}})]^{\infty } \end{aligned}$$

where the second equality is by [27, Exercise 8.14] and the third one is due to [27, Proposition 8.12]. Thus, the first part of the desired results follows. Using the same arguments as above, we can obtain the second part. The proof is completed. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Pan, S. & Bi, S. Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions. J Optim Theory Appl 188, 94–112 (2021). https://doi.org/10.1007/s10957-020-01779-7

Download citation

Received: 14 September 2019
Accepted: 03 November 2020
Published: 23 November 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10957-020-01779-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions

Abstract

Access this article

Similar content being viewed by others

Riemannian optimization on unit sphere with p-norm and its applications

Coefficient multipliers on mixed-norm spaces $$H(p,q,\varphi )$$

A Fast Algorithm to Estimate the Square Root of Probability Density Function

Change history

06 May 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

1.1 A: KL Property Relative to a Manifold

Lemma A.1

Proof

1.2 B: KL Property of the Quadratic Function over a Sphere

Lemma B.1

Proof

Lemma B.2

Proof

1.3 C: Supplementary Lemma and Proofs

Lemma C.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions

Abstract

Access this article

Similar content being viewed by others

Riemannian optimization on unit sphere with p-norm and its applications

Coefficient multipliers on mixed-norm spaces $$H(p,q,\varphi )$$

A Fast Algorithm to Estimate the Square Root of Probability Density Function

Change history

06 May 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

1.1 A: KL Property Relative to a Manifold

Lemma A.1

Proof

1.2 B: KL Property of the Quadratic Function over a Sphere

Lemma B.1

Proof

Lemma B.2

Proof

1.3 C: Supplementary Lemma and Proofs

Lemma C.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation