Abstract
This paper focuses on a class of zero-norm composite optimization problems. For this class of nonconvex nonsmooth problems, we establish the Kurdyka–Łojasiewicz property of exponent being a half for its objective function under a suitable assumption and provide some examples to illustrate that such an assumption is not very restricted which, in particular, involve the zero-norm regularized or constrained piecewise linear–quadratic function, the zero-norm regularized or constrained logistic regression function, the zero-norm regularized or constrained quadratic function over a sphere.
Similar content being viewed by others
Change history
06 May 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10957-021-01855-6
References
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizeations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and reguarlized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Pan, S.H., Liu, Y.L.: Metric subregularity of subdifferential and KL property of exponent 1/2. arXiv:1812.00558v3(2019)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)
Wang, X.F., Ye, J.J., Yuan, X.M., Zeng, S.Z., Zhang, J.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis. arXiv:1810.10051(2018)
Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferential. J. Convex Anal. 15, 365–380 (2008)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 1, 43–54 (1992)
Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
Zhou, Z.R., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)
Cui, Y., Sun, D.F., Toh, K.C.: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1300-6
D’Acunto, D., Kurdyka, D.: Explicit bounds for the Lojasiewicz exponent in the gradient inequality for polynomials. Ann. Polon. Math. 87, 51–61 (2005)
Li, G.Y., Mordukhovich, B.S., Phạm, T.S.: New fractional error bounds for polynomial systems with application to Holderian stability in optimization and spectral theory of tensors. Math. Program. 153(2), 333–362 (2015). (Ser. A)
Li, G.Y., Mordukhovich, B.S., Nghia, T.T.A., Phạm, T.S.: Error bounds for parametric polynomial systems with applications to higher-order stability analysis and convergence rates. Math. Program. 168(1–2), 313–346 (2018)
Li, G.Y., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)
Yu, P.R., Li, G.Y., Pong, T.K.: Deducing Kurdyka–Łojasiewicz exponent via inf-projection. arXiv:1902.03635 (2019)
Liu, H.K., So, A.M.-C., Wu, W.J.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1285-1
Zhang, Q., Chen, C.H., Liu, H.K., So, A.M.-C., Zhou, Z.R.: On the linear convergence of the ADMM for regularized non-convex low-rank matrix recovery. https://www1.se.cuhk.edu.hk/~manchoso/admm_MF.pdf
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
Yuan, X.T., Zhang, T.: Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14, 899–925 (2013)
Asteris, M., Papailiopoulos, D., Dimakis, A.: Nonnegative sparse PCA with provable guarantees. In: International Conference on Machine Learning (2014)
Brodie, J., Daubechies, I., De Mol, C., Giannone, D., Loris, I.: Sparse and stable Markowitz portfolios. Proc. Natl. Acad. Sci. 106, 12267–12272 (2009)
Zhang, J.Y., Liu, H.Y., Wen, Z.W., Zhang, S.Z.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40, A3091–A3120 (2017)
Rockafellar, R.T., Wets, R.J.: Variational analysis. Springer, New York (1998)
Mordukhovich, B.S.: Variational Analysis and Applications. Springer, Cham (2018)
Le, Y.H.: Generalized subdifferentials of the rank function. Optim. Lett. 7, 731–743 (2013)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.F.: Restricted normal cones and sparsity optimization with affine constraints. Found. Comput. Math. 14, 63–83 (2014)
Feng, X., Wu, C.L.: Every critical point of an \(l_0\) regularized minimization model is a local minimizer. arXiv:1912.04498 (2019)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–214 (1981)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer International Publishing (2016)
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand 49, 263–265 (1952)
Lemarechal, C.: Conver analysis and minimization algorithm I. Springer, New York (1991)
Sun, J.: On Monotropic Piecewise Quadratic Programming. Ph.D Thesis, Department of Mathematics, University of Washington, Seattle(1986)
Ioffe, A.D., Outrata, J.V.: On metric and calmness qualification conditions in subdifferential calculus. Set-valued Anal. 16, 199–227 (2008)
Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
Acknowledgements
The authors would like to give their sincere thanks to two anonymous reviewers for their helpful comments, which improve greatly the original manuscript. The research of S. H. Pan and S. J. Bi is supported by the National Natural Science Foundation of China under Project No. 11971177 and No. 11701186, and Guangdong Basic and Applied Basic Research Foundation (2020A1515010408).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Boris S. Mordukhovich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
1.1 A: KL Property Relative to a Manifold
Let \({\mathcal {M}}\subset {\mathbb {R}}^p\) be a \({\mathcal {C}}^2\)-smooth manifold and \(f\!:{\mathcal {M}}\rightarrow {\mathbb {R}}\) be a \({\mathcal {C}}^2\)-smooth function. The set of critical points of the problem \( \min _{x\in {\mathcal {M}}}f(x) \) is \({\mathcal {X}}:=\big \{x\in {\mathcal {M}}:\ \nabla _{\!{\mathcal {M}}}f(x)=0\big \}\), where \(\nabla _{\!{\mathcal {M}}}f(z)\) is the projection of \(\nabla \!f(z)\) onto the tangent space \({\mathcal {T}}_{{\mathcal {M}}}(z)\) of \({\mathcal {M}}\) at z. We say that f is a KL function of exponent 1/2 relative to \({\mathcal {M}}\) if f has the KL property of exponent 1/2 at each \({\overline{x}}\in {\mathcal {X}}\), i.e., there exist \(\delta >0\) and \(\gamma >0\) such that
This part states the relation between the KL property of exponent 1/2 of f relative to \({\mathcal {M}}\) and the KL property of exponent 1/2 for its extended \({\widetilde{f}}(x):=f(x)+\delta _{{\mathcal {M}}}(x)\) for \(x\in {\mathbb {R}}^p\).
Lemma A.1
Let \({\mathcal {M}}\!\subset \!{\mathbb {R}}^p\) be a \({\mathcal {C}}^2\)-smooth manifold and \(f\!:{\mathcal {M}}\rightarrow {\mathbb {R}}\) be a \({\mathcal {C}}^2\)-smooth function. If f is a KL function of exponent 1/2 relative to \({\mathcal {M}}\), then \({\widetilde{f}}\) is a KL function of exponent 1/2. Conversely, if \({\widetilde{f}}\) is a KL function of exponent 1/2 and each critical point is a local minimizer, then f is a KL function of exponent 1/2 relative to \({\mathcal {M}}\).
Proof
Notice that \(\partial \!{\widetilde{f}}(x)=\nabla \!f(x) +{\mathcal {N}}_{{\mathcal {M}}}(x)\) for any \(x\in {\mathcal {M}}\). Clearly, \({\mathcal {X}}=\mathrm{crit}{\widetilde{f}}\). Fix an arbitrary \({\overline{x}}\in {\mathcal {X}}\). Since f has the KL property of exponent 1/2 relative to \({\mathcal {M}}\) at \({\overline{x}}\), there exist \(\delta >0\) and \(\gamma >0\) such that (13) holds for all \(z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}\). Fix an arbitrary \(\eta >0\) and an arbitrary \(x\in {\mathbb {B}}({\overline{x}},\delta ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]\). Clearly, \(x\in {\mathcal {M}}\). Moreover,
Along with (13), \(\mathrm{dist}(0,\partial \!{\widetilde{f}}(x))\ge \gamma \sqrt{f(x)-f({\overline{x}})}\). So, the first part of the results follows.
Next we focus on the second part. Fix an arbitrary \({\overline{x}}\in {\mathcal {X}}\). By the given assumption, clearly, \({\overline{x}}\) is a local optimal solution of \( \min _{x\in {\mathcal {M}}}f(x). \) Hence, there exists \(\varepsilon '>0\) such that
By the KL property of exponent 1/2 of \({\widetilde{f}}\) at \({\overline{x}}\), there exist \(\varepsilon ,c>0\) and \(\eta >0\) such that
Since f is \({\mathcal {C}}^2\)-smooth around \({\overline{x}}\), there exists \(\varepsilon ''>0\) such that for all \(z\in {\mathbb {B}}({\overline{x}},\varepsilon '')\cap {\mathcal {M}}\), \( f(z)<f({\overline{x}})+\eta . \) Take \(\delta =\min (\varepsilon ,\varepsilon ',\varepsilon '')\). Fix an arbitrary \(x\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {M}}\). Clearly, \( f({\overline{x}})\le f(x)\le f({\overline{x}})+\eta . \) If \(f(x)>f({\overline{x}})\), then \(x\in {\mathbb {B}}({\overline{x}},\varepsilon ) \cap [{\widetilde{f}}({\overline{x}})<{\widetilde{f}} <{\widetilde{f}}({\overline{x}})+\eta ]\), and from (15) and (14), \( \Vert \nabla \!f_{{\mathcal {M}}}(x)\Vert \ge c\sqrt{|f(x)-f({\overline{x}})|}. \) If \(f(x)=f({\overline{x}})\), this inequality holds automatically. \(\square \)
1.2 B: KL Property of the Quadratic Function over a Sphere
For any integer \(m\ge 1\) and any given \(m\times m\) real symmetric H, define \(g(z):=z^{{\mathbb {T}}}Hz+\delta _{{\mathcal {S}}}(z)\) for \(z\in {\mathbb {R}}^m\). Lemma 1 in Appendix A and [19, Theorem 1] imply that g is a KL function of exponent 1/2. This part gives a different proof, which needs the following lemmas.
Lemma B.1
The critical point set of g takes the form of \(\mathrm{crit}g=\big \{z\in {\mathcal {S}}:\ Hz=\langle z,Hz\rangle z\big \}.\) So, by letting H have the eigenvalue decomposition \(P\varLambda P^{{\mathbb {T}}}\) with \(\varLambda =\mathrm{diag}(\lambda _1,\ldots ,\lambda _m)\) for \(\lambda _1\ge \cdots \ge \lambda _m\) and \(P\in {\mathbb {O}}^m\), \(\mathrm{crit}g =PW\) with \( W=\big \{y\in {\mathcal {S}}:\ \varLambda y=\langle y,\varLambda y\rangle y\big \}.\)
Proof
By [27, Exercise 8.8] and Lemma 3.1, it immediately follows that for any \(z\in {\mathbb {R}}^m\),
Choose an arbitrary \({\overline{z}}\in \mathrm{crit}g\). From (16), there exists \({\overline{t}}\in {\mathbb {R}}\) such that \(0=2H{\overline{z}}+{\overline{t}}{\overline{z}}\). Along with \(\Vert {\overline{z}}\Vert =1\), we have \({\overline{t}}=-2\langle {\overline{z}},H{\overline{z}}\rangle \), and hence \({\overline{z}}\in \!\big \{z\in {\mathcal {S}}:\ Hz=\!\langle z,Hz\rangle z\big \}\). Consequently, \( \mathrm{crit}g\subseteq \big \{z\in {\mathcal {S}}:\ Hz=\langle z,Hz\rangle z\big \}. \) The converse inclusion is immediate to check by Lemma 3.1. Thus, the first part follows. The second part is immediate. \(\square \)
Lemma B.2
Let \(D=\mathrm{diag}(d_1,d_2,\ldots ,d_p)\) with \(d_1\ge d_2\ge \cdots \ge d_p\). Define the function \(\psi (x):=x^{{\mathbb {T}}}Dx+\delta _S(x)\) for \(x\in {\mathbb {R}}^p\). Then, \(\psi \) is a KL function of exponent 1/2.
Proof
By Lemma B.1, it is immediate to obtain the following characterization for \(\mathrm{crit}\psi \):
Clearly, for each \(x\in \mathrm{crit}\,\psi \), \(d_i=\langle x,Dx\rangle \) with \(i\in \mathrm{supp}(x)\). For any \(z\in \mathrm{dom}\,\partial \psi \), we have
Now fix an arbitrary \({\overline{x}}\in \mathrm{crit}\,\psi \). From (17), it immediately follows that \( -D{\overline{x}}+\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}}=0. \) We next proceed the arguments by two cases as will be shown below.
Case 1: \(d_1=\cdots =d_p=\gamma \) for some \(\gamma \in {\mathbb {R}}\). Choose an arbitrary \(\eta >0\) and an arbitrary \(\delta >0\). Fix an arbitrary \(x \in {\mathbb {B}}({\overline{x}},\delta )\cap [\psi ({\overline{x}})<\psi (x) < \psi ({\overline{x}})+\eta ]\). Clearly, \(x\in {\mathcal {S}}\) and \(\langle x,Dx\rangle =\gamma \). Combining \(\langle {\overline{x}},D{\overline{x}}\rangle {\overline{x}}=D{\overline{x}}\) and Eq. (18) yields that
In addition, \(\psi (x)=\psi ({\overline{x}})=\gamma \). This means that \( \text{ dist }(0, \partial \psi (x))=\sqrt{\psi (x)-\psi ({\overline{x}})}. \)
Case 2: there exist \(i\ne j\in \{1,2,\ldots ,p\}\) such that \(d_i\ne d_j\). Write \(J=\mathrm{supp}({\overline{x}})\) and \({\overline{J}}=\{1,\ldots ,p\}\backslash J\). By (17), we know that \(d_i=\langle {\overline{x}},D{\overline{x}}\rangle \) for all \(i\in J\). This means that there must exist an index \(\kappa \in {\overline{J}}\) such that \(d_{\kappa }\ne \langle {\overline{x}},D{\overline{x}}\rangle \). Write \( {\overline{J}}_1:=\big \{i\in {\overline{J}}:\ d_i\ne \langle {\overline{x}},D{\overline{x}}\rangle \big \}. \) By the continuity of the function \(\langle \cdot ,D\cdot \rangle \), there exists \(\delta >0\) such that for all \(z\in {\mathbb {B}}({\overline{x}},\delta )\cap {\mathcal {S}}\),
Choose an arbitrary \(\eta >0\). Fix an arbitrary \(x\in {\mathbb {B}}({\overline{x}},\delta ) \cap [\psi ({\overline{x}})<\psi (x)<\psi ({\overline{x}})+\eta ]\). Clearly, \(x \in {\mathcal {S}}\). From Eq. (18), it follows that
where the third equality is due to (17), the first inequality is by the definition of \({\overline{J}}_1\), and the last inequality is due to (19). On the other hand, by the definition of \(\psi \),
where the fourth equality is due to (17), the fifth one is by the definition of \({\overline{J}}_1\), and the inequality is since \(\psi (x)-\psi ({\overline{x}})>0\). From the above inequalities (20) and (21),
By the arbitrariness of x, Cases 1 and 2 show that \(\psi \) has the KL property with exponent 1/2 at \({\overline{x}}\). From the arbitrariness of \({\overline{x}}\) in \(\mathrm{crit}\psi \), \(\psi \) is a KL function of exponent 1/2. \(\square \)
Now we prove that g is a KL function of exponent 1/2. Fix an arbitrary \({\overline{z}}\in \mathrm{crit}g\). Let H have the eigenvalue decomposition as in Lemma B.1. Then, \({\overline{y}}=P^{{\mathbb {T}}}{\overline{z}}\in \mathrm{crit}\psi \) where \(\psi \) is defined in Lemma B.2 with \(D=\varLambda \). By Lemma B.2, there exist \(\eta>0,\delta >0\) and \(c>0\) such that
Fix an arbitrary \(z\in {\mathbb {B}}({\overline{z}},\delta )\cap [g({\overline{z}})<g<g({\overline{z}})+\eta ]\). Clearly, \(z\in {\mathcal {S}}\). Write \(y=P^{{\mathbb {T}}}z\). Then, \(y\in {\mathcal {S}}\) and \(g(z)=\psi (y)\). Since \(g({\overline{z}})=g({\overline{y}})\), \( y\in {\mathbb {B}}({\overline{y}},\delta )\cap [\psi ({\overline{y}})<\psi (y) < \psi ({\overline{y}})+\eta ]. \) In addition, from (16) and the eigenvalue decomposition of H, \(\partial g(z)=P\partial \psi (y)\). Thus,
Together with \(\psi (y)-\psi ({\overline{y}})=g(z)-g({\overline{z}})\), it follows that g has the KL property with exponent of 1/2 at \({\overline{z}}\). By the arbitrariness of \({\overline{z}}\) in \(\mathrm{crit}g\), g is a KL function of exponent 1/2.
1.3 C: Supplementary Lemma and Proofs
The following lemma extends the result of [34, Section 2.3] for the differentiable strongly convex function to the setting of closed proper strongly convex functions. In particular, it implies that the composite g is a KL function of exponent 1/2 without surjectivity of \({\mathcal {A}}\).
Lemma C.1
Consider \(g(x)\!:=\vartheta ({\mathcal {A}}x)\) for \(x\in {\mathbb {X}}\) where \({\mathcal {A}}\!:{\mathbb {X}}\rightarrow {\mathbb {Z}}\) is a linear mapping, and \(\vartheta \!:{\mathbb {Z}}\rightarrow ]-\infty ,\infty ]\) is a proper closed strongly convex function with modulus \(\mu \). Here, \({\mathbb {X}}\) and \({\mathbb {Z}}\) are two finite dimensional vector spaces equipped with the inner product \(\langle \cdot ,\cdot \rangle \) and its induced norm \(\Vert \cdot \Vert \). If \(\mathrm{ri}(\mathrm{dom}\vartheta )\cap \mathrm{range}{\mathcal {A}}\ne \emptyset \), then there exists a constant \({\overline{c}}>0\) such that
where \(g^*\) denotes the minimum value of the function g.
Proof
Pick an arbitrary \(x^*\in \mathrm{crit}g\) (if \(\mathrm{crit}g=\emptyset \), the conclusion holds automatically). We first prove that \(\mathrm{crit}g =\{x\in {\mathbb {X}}\!:\,{\mathcal {A}}x={\mathcal {A}}x^*\}\). To this end, pick any \(x'\in {\mathbb {X}}\) with \({\mathcal {A}}x'={\mathcal {A}}x^*\). Since \(\mathrm{ri}(\mathrm{dom}\vartheta )\cap \mathrm{range}{\mathcal {A}}\ne \emptyset \), by [30, Theorem 23.9], we have \( \partial g(x')={\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x') = {\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x^*) = \partial g(x^*). \) Notice that \(0\in \partial g(x^*) \), we obtain \(0\in \partial g(x')\), which implies that \(x'\in \mathrm{crit}g\). This means that \(\{x\in {\mathbb {X}}\!:\,{\mathcal {A}}x={\mathcal {A}}x^*\}\subseteq \mathrm{crit}g\). Suppose that there exist \({\overline{x}}\in \mathrm{crit} g\) such that \({\mathcal {A}}{\overline{x}}\ne {\mathcal {A}}x^*\). Then, by the strong convexity of \(\vartheta \), we have
This contradicts the fact that \(x', x^* \in \mathrm{crit}g\). Thus, the equality \(\mathrm{crit} g= \{x \in {\mathbb {X}}\!:\,{\mathcal {A}}x = {\mathcal {A}}x^*\}\) holds. By Hoffman inequality [35], there exists a constant \({\overline{c}}>0\) such that for any \(z \in {\mathbb {X}}\),
where \(\varPi _{\mathrm{crit} g}\) is the projection mapping onto \(\mathrm{crit} g\). Fix an arbitrary \(x\in {\mathbb {X}}\). If \(x\notin \mathrm{dom}\partial g\), the inequality (22) holds trivially. So, it suffices to consider the case \(x\in \mathrm{dom}\partial g\). By [30, Theorem 23.9], \(\partial g(x)={\mathcal {A}}^* \partial \vartheta ({\mathcal {A}}x)\). Obviously, \(\partial \vartheta ({\mathcal {A}}x) \ne \emptyset \). Pick any \(\xi \in \partial \vartheta ({\mathcal {A}}x)\). By the strong convexity of \(\vartheta \) and [36, Theorem 6.1.2], it follows that
By taking \(z = \varPi _{\mathrm{crit} g}(x)\), from the last inequality we obtain that
where the second inequality follows from (23). Note that \(g(\varPi _{\mathrm{crit} g}(x))=g(x^*)\). The last inequality implies that \(\Vert {\mathcal {A}}^*\xi \Vert ^2\ge (2\mu /{\overline{c}}^2)[g(x)-g(x^*)]\). Together with \(\partial g(x)={\mathcal {A}}^*\partial \vartheta ({\mathcal {A}}x)\),
This implies that the desired inequality (22) holds. \(\square \)
The proof of Proposition 3.2: First, we assume that \(\psi \) is a proper closed piecewise linear–quadratic convex function. From Lemma 3.2 and 3.3, we observe that the multifunction \(\partial h\) is piecewise, i.e., its graph is the union of finitely many polyhedral sets. So, \(\partial h\) is locally upper Lipschitzian at each point \(x\in {\mathbb {R}}^p\) by [33, Proposition 1], which implies that \(\partial h\) is metrically subregular at each point of its graph. In addition, Sun [37] showed that a proper closed convex function \(\psi \) is piecewise linear–quadratic if and only if \(\partial \psi \) is piecewise polyhedral. Thus, by combining [33, Proposition 1] and [38, Section 3.2], we obtain the conclusion.
Now assume \(\psi =\delta _C\). Fix an arbitrary \({\overline{x}}\in C\). Write \(J=\mathrm{supp}({\overline{x}})\) and \({\overline{J}}=\{1,\ldots ,p\}\backslash J\). Define the subspace \(L\!:=\{x\in {\mathbb {R}}^p:\ x_i=0\ \mathrm{for}\ i\in {\overline{J}}\}\). By Lemma 3.2, \(\partial h({\overline{x}})={\mathcal {N}}_{L}({\overline{x}})\). Take an arbitrary \(v\in \widehat{\partial }(\delta _C+h)({\overline{x}})\). From Definition 3.1, it follows that
which implies that \(v\in \widehat{\partial }\delta _{C\cap L}({\overline{x}})\). Consequently, \(\widehat{\partial }(\delta _C+h)({\overline{x}}) \subseteq \widehat{\partial }\delta _{C\cap L}({\overline{x}})\). Together with [27, Corollary 10.9], Lemma 3.2, \(\partial h({\overline{x}})={\mathcal {N}}_{L}({\overline{x}})\) and the convexity of C, we have
By the arbitrariness of \({\overline{x}}\), this implies that \(\partial \delta _C(x)+\partial h(x)=\widehat{\partial }(\delta _C+h)(x)\) for any \(x\in C\). Next we argue that \(\partial (\delta _C+h)({\overline{x}})\subseteq \partial \delta _C({\overline{x}})+\partial h({\overline{x}})\). Take an arbitrary \(v\in \partial (\delta _C+h)({\overline{x}})\). There exist \(x^k\xrightarrow [\delta _{C}+h]{}{\overline{x}}\) and \(v^k\in \widehat{\partial }(\delta _{C}\!+\!h)(x^k)\) with \(v^k\rightarrow v\) as \(k\rightarrow \infty \). From the previous arguments, \(v^k\in \partial \delta _{C}(x^k)+\partial h(x^k)\) for each k. Since \(x^k\rightarrow {\overline{x}}\), we have \(x^k\ne 0\) and \(\mathrm{supp}(x^k)\supseteq J\) for all sufficiently large k. Since \(\delta _{C}(x^k)+h(x^k)\rightarrow \delta _{C}({\overline{x}})+h({\overline{x}})\), we must have \(x^k\in C\) and \(h(x^k)\rightarrow h({\overline{x}})\) for all sufficiently large k. The latter, along with \(\mathrm{supp}(x^k)\supseteq J\), implies that \(\mathrm{supp}(x^k)=J\) for all sufficiently large k. So, \(\partial h(x^k)=\partial \delta _{L}(x^k)\) for large enough k. Combing with (24) and \(v^k\in \partial \delta _{C}(x^k)+\partial h(x^k)\), we have \(v^k\in \partial \delta _{C\cap L}(x^k)\). Then, \(v\in \partial (\delta _C+\delta _{L})({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial \delta _{L}({\overline{x}}) =\partial \delta _C({\overline{x}})+\partial h({\overline{x}})\). The stated inclusion holds. The previous arguments imply that \( \widehat{\partial }(\delta _C+h)({\overline{x}}) =\partial (\delta _C+h)({\overline{x}}) ={\mathcal {N}}_{C}({\overline{x}})+\partial h({\overline{x}}) =\partial \delta _{C\cap L}({\overline{x}}). \) Suppose that \(\partial \delta _{C\cap L}({\overline{x}})\ne \emptyset \) (if not, the last equation implies the result). Then, we have
where the second equality is by [27, Exercise 8.14] and the third one is due to [27, Proposition 8.12]. Thus, the first part of the desired results follows. Using the same arguments as above, we can obtain the second part. The proof is completed. \(\square \)
Rights and permissions
About this article
Cite this article
Wu, Y., Pan, S. & Bi, S. Kurdyka–Łojasiewicz Property of Zero-Norm Composite Functions. J Optim Theory Appl 188, 94–112 (2021). https://doi.org/10.1007/s10957-020-01779-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-020-01779-7