Skip to main content
Log in

Minimizers of Sparsity Regularized Huber Loss Function

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We investigate the structure of the local and global minimizers of the Huber loss function regularized with a sparsity inducing L0 norm term. We characterize local minimizers and establish conditions that are necessary and sufficient for a local minimizer to be strict. A necessary condition is established for global minimizers, as well as non-emptiness of the set of global minimizers. The sparsity of minimizers is also studied by giving bounds on a regularization parameter controlling sparsity. Results are illustrated in numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. While everyone dealing with the Huber function uses the constant, we were not able to find a derivation, so it is provided.

  2. This set appears in our subsequent results and is shown to be dense in \({{\mathbb {R}}}^N\) in “Appendix.”

References

  1. Zoubir, A., Koivunnen, V., Ollila, E., Muma, M.: Robust Statistics for Signal Processing. Cambridge University Press, Cambridge (2018)

    Book  MATH  Google Scholar 

  2. Huber, P.J.: Robust Statistics. Wiley, New York (1981)

    Book  MATH  Google Scholar 

  3. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2008)

    MATH  Google Scholar 

  4. El, d’Aspremont A., Ghaoui, L.: Testing the nullspace property using semidefinite programming. Math. Program. 127(1), 123–144 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bryan, K., Leise, T.: Making do with less: an introduction to compressed sensing. SIAM Rev. 55, 547–566 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theor. 52(2), 489–509 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 52, 4203–4215 (2006)

    MathSciNet  MATH  Google Scholar 

  8. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chrétien, S.: An alternating \(\ell _1\) approach to the compressed sensing problem. IEEE Signal Proc. Lett. 17, 181–184 (2010)

    Google Scholar 

  10. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 51, 1289–1306 (2005)

    MathSciNet  MATH  Google Scholar 

  11. Donoho, D.L.: For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59, 797–829 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, New York (2010)

    Book  MATH  Google Scholar 

  13. Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013)

    Book  MATH  Google Scholar 

  14. Tropp, J.A.: Recovery of short, complex linear combinations via \(\ell _1\) minimization. IEEE Trans. Inf. Theor. 51(4), 1568–1570 (2005)

    MATH  Google Scholar 

  15. Fuchs, J.J.: On sparse representations in arbitrary redundant bases. IEEE Trans. Inf. Theor. 50(6), 1341–1344 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  16. Nikolova, M.: Description of the minimizers of least squares regularized with \(\ell _0\)-norm. Uniqueness of the global minimizer. SIAM J. Imaging Sci. 6(2), 904–937 (2013)

    MathSciNet  MATH  Google Scholar 

  17. Beck, A., Hallak, N.: Proximal mapping for symmetric penalty and sparsity. SIAM J. Optim. 28(1), 496–527 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  18. Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theor. 52, 1030–1051 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  19. Chancelier, J.-Ph., De Lara, M.: Hidden convexity in the l0 pseudonorm. HAL (2019)

  20. Chancelier, J.-Ph., De Lara, M.: Lower bound convex programs for exact sparse optimization. HAL (2019)

  21. Chancelier, J.-Ph., De Lara, M.: A suitable conjugacy for the l0 sseudonorm. HAL (2019)

  22. Soubies, E., Blanc-Féraud, L., Aubert, G.: New insights on the \(\ell _2\)-\(\ell _0\) minimization problem. J. Math. Imaging Vis. 62, 808–824 (2020)

    MATH  Google Scholar 

  23. Lanza, A., Morigi, S., Selesnick, I.W., Sgallari, F.: Sparsity-inducing non-convex, non-separable regularization for convex image processing. SIAM J. Imaging Sci. 12(2), 1099–1134 (2019)

    Article  MathSciNet  Google Scholar 

  24. Selesnick, I.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17), 4481–4494 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  25. Wang, S., Chen, X., Dai, W., Selesnick, I.W., Cai, G.: Vector minimax concave penalty for sparse representation. Digital Signal Process. 83, 165–179 (2018)

    Article  MathSciNet  Google Scholar 

  26. Wang, J., Zhang, F., Huang, J., Wang, W., Yuan, C.: A non-convex penalty function with integral convolution approximation for compressed sensing. Signal Process. 158, 116–128 (2019)

    Article  Google Scholar 

  27. Chen, K., Lv, Q., Lu, Y., Dou, Y.: Robust regularized extreme learning machine for regression using iteratively reweighted least squares. Neurocomputing 230, 345–358 (2017)

    Article  Google Scholar 

  28. Carrillo, R.E., Ramirez, A.B., Arce, G.R., Barner, K.E., Sadler, B.M.: Robust compressive sensing of sparse signals: a review. EURASIP J. Adv. Signal Process. 2016, 108 (2016)

    Article  Google Scholar 

  29. Grechuk, B., Zabarankin, M.: Regression analysis: likelihood, error and entropy. Math. Program. 174, 145–166 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  30. Li, W., Swetits, J.J.: The linear \(\ell _1\) estimator and the Huber M-estimator. SIAM J. Optim. 8, 457–475 (1998)

    MathSciNet  MATH  Google Scholar 

  31. Madsen, K., Nielsen, H.B.: A finite smoothing algorithm for linear \(\ell _1\) estimation. SIAM J. Optim. 3, 223–235 (1993)

    MathSciNet  MATH  Google Scholar 

  32. Madsen, K., Nielsen, H.B., Pınar, M.Ç.: New characterizations of \( \ell _1 \) solutions to overdetermined systems of linear equations. Oper. Res. Lett. 16, 159–166 (1994)

    MathSciNet  MATH  Google Scholar 

  33. Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer, New York (2003)

    MATH  Google Scholar 

  34. Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  35. Pınar, M.Ç.: Linear Huber M-estimator under ellipsoidal data uncertainty. BIT 42(4), 856–866 (2002)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustafa Ç. Pınar.

Additional information

Communicated by Panos M. Pardalos.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Definition 8.1

Let \( \Psi :{{\mathbb {R}}}^N\rightarrow {{\mathbb {R}}}\) be a differentiable function over \( {{\mathbb {R}}}^N \) and its gradient has a Lipschitz constant \( L_\Psi >0 \):

$$\begin{aligned} \left\Vert \triangledown \Psi (x)-\triangledown \Psi (y)\right\Vert _2\le L_{\Psi }\left\Vert x-y\right\Vert _2 \text { for all } x,y\in {{\mathbb {R}}}^N. \end{aligned}$$

Proposition 8.1

\( \Psi \) has a Lipschitz constant \( \dfrac{\left\Vert A\right\Vert ^2_2}{\gamma } \), where \( \left\Vert A\right\Vert _2=\sup _{\left\Vert x\right\Vert _2=1}\dfrac{\left\Vert Ax\right\Vert _2}{\left\Vert x\right\Vert _2}. \)

Proof

Let \( x,y\in {{\mathbb {R}}}^N \),

$$\begin{aligned} \left\Vert \triangledown \Psi (x)-\triangledown \Psi (y)\right\Vert _2&=\left\Vert A^{ T}[\hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)]\right\Vert _2\\&\le \left\Vert A\right\Vert _2\left\Vert \hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)\right\Vert _2\\&=\left\Vert A\right\Vert _2\dfrac{\left\Vert \hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)\right\Vert _2}{\left\Vert A(x-y)\right\Vert _2}\left\Vert A(x-y)\right\Vert _2\\&\le \left\Vert A\right\Vert _2^2\dfrac{\left\Vert \hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)\right\Vert _2}{\left\Vert (Ax-d)-(Ay-d)\right\Vert _2}\left\Vert x-y\right\Vert _2\\&\le \dfrac{\left\Vert A\right\Vert _2^2}{\gamma }\left\Vert x-y\right\Vert _2. \end{aligned}$$

Last inequality comes from continuity of the gradient. \(\square \)

Remark 8.1

One should use the Frobenius norm for easy computation, which causes the Lipschitz constant to be \( \dfrac{\left\Vert A\right\Vert ^2_F}{\gamma } \). This choice is safe since \( \left\Vert A\right\Vert _2\le \left\Vert A\right\Vert _F \).

Proposition 8.2

\( {\mathbb {B}}_\gamma \) defined in Theorem 5.2 is a dense subset of \( {{\mathbb {R}}}^N \).

Proof

Let \( c:{{\mathbb {R}}}^N\rightarrow {{\mathbb {R}}}^M \) be a linear continuous operator defined as \( c(x)=Ax-d \). Since A is full rank with \( M<N \), c is a surjection. Let \( T=c({\mathbb {B}}_\gamma ) \) where \( c({\mathbb {B}}_\gamma ) \) is the image of set \( {\mathbb {B}}_\gamma \). Hence, \( T=\{v\in {{\mathbb {R}}}^M:\forall i\in {{\mathbb {I}}}_M, \left|v[i]\right|\ne \gamma \} \). Let \( {{\mathcal {O}}}\) be an arbitrary non-empty open set in \( {{\mathbb {R}}}^M \). Let \( {\bar{v}}\in {{\mathcal {O}}}\) and define an index set as follows: \( {\bar{v}}_\gamma =\{i\in {{\mathbb {I}}}_M:\left|{\bar{v}}[i]\right|=\gamma \}. \) Now, there is some positive radius \( r_{{\bar{v}}} \) such that \( B_\infty ({\bar{v}},r_{{\bar{v}}}) \) stays in \( {{\mathcal {O}}}\). If we define

$$\begin{aligned} r^*=\min \Bigg \{\min _{i\notin {\bar{v}}_\gamma }\{\left|{\bar{v}}[i]-\gamma \right|\},\min _{i\notin {\bar{v}}_\gamma }\{\left|{\bar{v}}[i]+\gamma \right|\},r_{{\bar{v}}},\gamma \Bigg \}, \end{aligned}$$

we have \( r^*>0 \) and \( B_\infty ({\bar{v}},r_{{\bar{v}}}) \) stays in \( {{\mathcal {O}}}\). Then, for any \( 0<\delta <r^* \), \( v^*={\bar{v}}+\delta \mathbb {1}_M \) belongs to \( {{\mathcal {O}}}\) and T at the same time. Hence, T is a dense set. T is the image of a continuous surjection; therefore, \( {\mathbb {B}}_\gamma \) is a dense set too. \(\square \)

Remark 8.2

Previous result shows that one can construct a sequence of vectors from \( {\mathbb {B}}_\gamma \) converging to any other vector in \( {{\mathbb {R}}}^N \). This may be useful for deriving algorithms using second-order methods since the second derivative exists only for vectors in \( {\mathbb {B}}_\gamma \).

For all numerical experiments reported in this paper, we use the following equivalent formulation for (HR)

figure j

Proposition 8.3

(Equivalent Characterization for (HR), [35]) Any optimal solution to the quadratic program (QHR2) is a minimizer of \( \Psi \), and conversely.

This alternative eliminates the need to work with piecewise functions and provides an easier computation tool. We used the following algorithm for the numerical examples reported in the paper:

figure k

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akkaya, D., Pınar, M.Ç. Minimizers of Sparsity Regularized Huber Loss Function. J Optim Theory Appl 187, 205–233 (2020). https://doi.org/10.1007/s10957-020-01745-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01745-3

Keywords

Mathematics Subject Classification

Navigation