Skip to main content
Log in

Robust mixed-norm constrained regression with application to face recognitions

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Most existing regression-based classification methods cope with pixelwise noise via \(\ell _1\)-norm or \(\ell _2\)-norm, but neglect the structural information between pixels. To the best of our knowledge, nuclear norm-based matrix regression approaches have achieved great success for addressing imagewise noise, but may result in unreasonable regression and incorrect classification, especially when test images are extremely corrupted by larger occlusions and severe illumination variations, since they apply the corrupted test images to reconstruction process directly, and the influence of noise will be unavoidable. To overcome this limitation, this paper presents a robust mixed-norm constrained regression model to deal with the structural noise corruption. To be more specific, nuclear norm of the error between corrupted test image and its corresponding recovered image is exploited as a regular term for characterizing the low rank noise structure, and Frobenius norm is utilized to depict the difference between the recovered image and restructured image on account of the less noise of recovered image. Then, we adopt the alternating direction method of multipliers to settle our proposed approaches efficiently. Furthermore, the theoretical convergence proof and detailed analysis of computational complexity are provided to assess our algorithms. Eventually, extensive experiments on five well-known face databases have manifested that the proposed methods outperform some state-of-the-art regression-based approaches for primarily addressing noise caused by occlusion and illumination changes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Chen S, Gong C, Yang J, Li X, Wei Y, Li J (2018) Adversarial metric learning. Preprint arXiv:180203170

  2. Tang J, Lin J, Li Z, Yang J (2018) Discriminative deep quantization hashing for face image retrieval. IEEE Trans Neural Netw Learn Syst 29(12):6154–6162

    Article  Google Scholar 

  3. Yang J, Luo L, Qian J, Tai Y, Zhang F, Xu Y (2017) Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes. IEEE Trans Pattern Anal Mach Intell 39(1):156–171

    Article  Google Scholar 

  4. Naseem I, Togneri R, Bennamoun M (2010) Linear regression for face recognition. IEEE Trans Pattern Anal Mach Intell 32(11):2106–2112

    Article  Google Scholar 

  5. Naseem I, Togneri R, Bennamoun M (2012) Robust regression for face recognition. Pattern Recognit 45(1):104–118

    Article  Google Scholar 

  6. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  Google Scholar 

  7. Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition? In: IEEE International conference on computer vision. IEEE, Barcelona, Spain, pp 471–478

  8. Holland PW, Welsch RE (1977) Robust regression using iteratively reweighted least-squares. Commun Stat-theor M 6(9):813–827

    Article  MATH  Google Scholar 

  9. Yang M, Zhang L, Yang J, Zhang D (2011) Robust sparse coding for face recognition. In: IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 625–632

  10. He R, Zheng WS, Hu BG (2011) Maximum correntropy criterion for robust face recognition. IEEE Trans Pattern Anal Mach Intell 33(8):1561–1576

    Article  Google Scholar 

  11. He R, Zheng WS, Tan T, Sun Z (2014) Half-quadratic-based iterative minimization for robust sparse representation. IEEE Trans Pattern Anal Mach Intell 36(2):261–275

    Article  Google Scholar 

  12. Jia K, Chan TH, Ma Y (2012) Robust and practical face recognition via structured sparsity. In: European conference on computer vision, Springer, pp 331–344

  13. Luo L, Yang J, Qian J, Tai Y (2015) Nuclear-\(\ell _1\) norm joint regression for face reconstruction and recognition with mixed noise. Pattern Recognit 48(12):3811–3824

    Google Scholar 

  14. Xu Y, Fang X, Wu J, Li X, Zhang D (2016) Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Trans Image Process 25(2):850–863

    Article  MathSciNet  MATH  Google Scholar 

  15. Qian J, Luo L, Yang J, Zhang F, Lin Z (2015) Robust nuclear norm regularized regression for face recognition with occlusion. Pattern Recognit 48(10):3145–3159

    Article  Google Scholar 

  16. Deng YJ, Li HC, Wang Q, Du Q (2018) Nuclear norm-based matrix regression preserving embedding for face recognition. Neurocomputing 311:279–290

    Article  Google Scholar 

  17. Luo L, Tu Q, Yang J, Yang J (2018) An adaptive line search scheme for approximated nuclear norm based matrix regression. Neurocomputing 289:23–31

    Article  Google Scholar 

  18. Zhang H, Wu QJ, Chow TW, Zhao M (2012) A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recognit 45(5):1866–1876

    Article  MATH  Google Scholar 

  19. Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in neural information processing systems, pp 2080–2088

  20. Favaro P, Vidal R, Ravichandran A (2011) A closed form solution to robust subspace estimation and clustering. In: IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 1801–1807

  21. Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781

    Article  Google Scholar 

  22. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184

    Article  Google Scholar 

  23. Candes EJ, Tao T (2010) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inf Theory 56(5):2053–2080

    Article  MathSciNet  MATH  Google Scholar 

  24. Nie FP, Huang H, Ding C, Luo DJ, Wang H (2011) Robust principal component analysis with non-greedy \(\ell _1\)-norm maximization. In: Proceedings of the 2011 international joint conference on artificial intelligence, vol 22, pp 1433–1438

  25. Vidal R, Favaro P (2014) Low rank subspace clustering (lrsc). Pattern Recognit Lett 43:47–61

    Article  Google Scholar 

  26. Chen J, Yang J, Luo L, Qian J, Xu W (2015) Matrix variate distribution-induced sparse representation for robust image classification. IEEE Trans Neural Netw Learn Syst 26(10):2291–2300

    Article  MathSciNet  Google Scholar 

  27. Zheng J, Lou K, Yang X, Bai C, Tang J (2019) Weighted mixed-norm regularized regression for robust face identification. IEEE Trans Neural Netw Learn Syst

  28. Chen S, Yang J, Wei Y, Luo L, Lu GF, Gong C (2019) \(\delta\)-norm-based robust regression with applications to image analysis. IEEE Trans Cybern

  29. Zhang H, Jian Y, Xie J, Qian J, Zhang B (2017) Weighted sparse coding regularized nonconvex matrix regression for robust face recognition. Inf Sci 394:1–17

    MathSciNet  Google Scholar 

  30. Chen S, Yang J, Luo L, Wei Y, Zhang K, Tai Y (2017) Low-rank latent pattern approximation with applications to robust image classification. IEEE Trans Image Process 26(11):5519–5530

    Article  MathSciNet  MATH  Google Scholar 

  31. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  32. Lin Z, Chen M, Ma Y (2010) The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Preprint arXiv:10095055

  33. Luo L, Yang J, Qian J, Tai Y, Lu GF (2017) Robust image regression based on the extended matrix variate power exponential distribution of dependent noise. IEEE Trans Neural Netw Learn Syst 28(9):2168–2182

    Article  MathSciNet  Google Scholar 

  34. Cai JF, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. Siam J Optim 20(4):1956–1982

    Article  MathSciNet  MATH  Google Scholar 

  35. Hale ET, Yin W, Zhang Y (2008) Fixed-point continuation for \(\ell _1\)-minimization: methodology and convergence. Siam J Optim 19(3):1107–1130

    MathSciNet  MATH  Google Scholar 

  36. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  MATH  Google Scholar 

  37. Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40

    Article  MATH  Google Scholar 

  38. Yuan X, Yang J (2013) Sparse and low-rank matrix decomposition via alternating direction methods. Pac J Optim 9(1):167–180

    MathSciNet  MATH  Google Scholar 

  39. He B, Yang H (1998) Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper Res Lett 23(3–5):151–161

    Article  MathSciNet  MATH  Google Scholar 

  40. Lee KC, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698

    Article  Google Scholar 

  41. Qian J, Yang J, Gao G (2013) Discriminative histograms of local dominant orientation (d-hldo) for biometric image feature extraction. Pattern Recognit 46(10):2724–2739

    Article  Google Scholar 

  42. Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In: Workshop on faces in ’Real-Life’ images: detection, alignment, and recognition, Marseille, France

  43. Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W (2005) Overview of the face recognition grand challenge. IEEE Comput Soc Conf Comput Vis Pattern Recognit 1:947–954

    Google Scholar 

  44. Gross R, Matthews I, Cohn J, Kanade T, Baker S (2010) Multi-pie. Image Vis Comput 28(5):807–813

    Article  Google Scholar 

  45. Zhang H, Yang J, Shang F, Gong C, Zhang Z (2018) Lrr for subspace segmentation via tractable schatten-\(p\) norm minimization and factorization. IEEE Trans Cybern 49(5):1722–1734

    Google Scholar 

  46. Guan N, Liu T, Zhang Y, Tao D, Davis LS (2017) Truncated cauchy non-negative matrix factorization. IEEE Trans Pattern Anal Mach Intell 41(1):246–259

    Article  Google Scholar 

Download references

Acknowledgements

This work is partly supported by The National Key Research and Development Program of China (No.2018YFB1004900) and 111 Project (No.B13022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianfeng Lu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proof of Theorem 3

Appendix A Proof of Theorem 3

Proof

In view of the fact that \(({{\mathbf{x}}^{*}},{\widetilde{\mathbf{H}}}^{*} ,{\mathbf{T}}^{*},{{\mathbf{Z}}}^{*})\) is a saddle point of L, we have \(L({{\mathbf{x}}^{*}},{\widetilde{\mathbf{H}}}^{*},{\mathbf{T}}^{*},{{\mathbf{Z}}}^{*}) \le L({{\mathbf{x}}_{k+1}},{\widetilde{\mathbf{H}}}_{k+1} ,{\mathbf{T}}_{k+1},{{\mathbf{Z}}}^{*})\), with \({{D}}({{\mathbf{x}}^{*}})-{\widetilde{\mathbf{H}}}^{*} - {\mathbf{T}}^{*} = 0\), and it can be rewritten as follows:

$$\begin{aligned} f^{*} - f_{k+1} \le \text {Tr}(({{\mathbf{Z}}}^{*})^{T}{\mathbf{R}}_{k+1}). \end{aligned}$$
(26)

For the sake of derivation, the augmented Lagrangian function of Eq. (5) can be reformulated as \({L_\mu }({\mathbf{x}},{\widetilde{\mathbf{H}}} ,{\mathbf{T}},{{\mathbf{Z}}})= {\Vert {\mathbf{T}} \Vert _F^2} + \lambda {\Vert {\mathbf{H}} - {\widetilde{\mathbf{H}}} \Vert _{*}} +\frac{\beta }{2} \left\| {\mathbf{x}} \right\| _2^{2} +\frac{\mu }{2} \Vert {{D}({\mathbf{x}})-{\widetilde{\mathbf{H}}}} - {\mathbf{T}} +\frac{1}{\mu } {{\mathbf{Z}}}\Vert _F^2-\frac{1 }{2\mu }\Vert {{\mathbf{Z}}}\Vert _F^2\).

\({\mathbf{x}}_{k+1} = \arg \mathop {\min }\limits _{{\mathbf{x}}} {L_\mu }({\mathbf{x}},{\widetilde{\mathbf{H}}}_{k},{\mathbf{T}}_{k},{\mathbf{Z}}_{k})\), which is equivalent to

$$\begin{aligned} 0 &= \partial { {L_\mu }({\mathbf{x}}_{k+1},{\widetilde{\mathbf{H}}}_{k},{\mathbf{T}}_{k},{\mathbf{Z}}_{k})}\\&= \frac{\beta }{2}\partial {(\Vert {\mathbf{x}}_{k+1}\Vert _p^p)}+\mu {\mathbf{M}}^{T}( {\mathbf{M}}{\mathbf{x}}_{k+1} -\text {Vec} \left( {\widetilde{\mathbf{H}}}_{k}+ {\mathbf{T}}_{k}-\frac{1}{\mu }{{\mathbf{Z}}}_{k}\right) , \end{aligned}$$

by virtue of \({{\mathbf{Z}}}_{k+1} ={\mathbf{Z}}_{k} +\mu ({D}({\mathbf{x}}_{k+1})-{\widetilde{\mathbf{H}}}_{k+1} - {\mathbf{T}}_{k+1})\), we can recombine to get

$$\begin{aligned} 0&= \frac{\beta }{2}\partial {(\Vert {\mathbf{x}}_{k+1}\Vert _p^p)} + {\mathbf{M}}^{T}(\text {Vec} ({\mathbf{Z}}_{k+1}) +\mu \text {Vec}({\widetilde{\mathbf{H}}}_{k+1}\\&\quad -\,{\widetilde{\mathbf{H}}}_{k})+\mu \text {Vec}({\mathbf{T}}_{k+1}-{\mathbf{T}}_{k})). \end{aligned}$$

This suggests that \({\mathbf{x}}_{k+1}\) optimizes \(\frac{\beta }{2}\Vert {\mathbf{x}}\Vert _p^p + (\text {Vec} ({\mathbf{Z}}_{k+1}) +\mu \text {Vec}({\widetilde{\mathbf{H}}}_{k+1}-{\widetilde{\mathbf{H}}}_{k})+ \mu \text {Vec}({\mathbf{T}}_{k+1} -{\mathbf{T}}_{k}))^{T}{\mathbf{M}}{\mathbf{x}}\). Similar to the above derivation, we can get the arguments that \({\widetilde{\mathbf{H}}}_{k+1}\) minimizes \(\lambda {\Vert {\mathbf{H}} - {\widetilde{\mathbf{H}}} \Vert _{*}} -\text {Tr}(({\mathbf{Z}}_{k+1} + \mu ({\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}))^{T}{\widetilde{\mathbf{H}}})\), and \({\mathbf{T}}_{k+1}\) minimizes \(\Vert {\mathbf{T}}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k+1})^{T}{\mathbf{T}})\). Hence, we have

$$\begin{aligned} \begin{aligned}&(1) \quad \frac{\beta }{2}\Vert {\mathbf{x}}_{k+1}\Vert _p^p + (\text {Vec} ({\mathbf{Z}}_{k+1}) +\mu \text {Vec}({\widetilde{\mathbf{H}}}_{k+1} -{\widetilde{\mathbf{H}}}_{k})\\&\qquad \quad +\,\mu \text {Vec}({\mathbf{T}}_{k+1} -{\mathbf{T}}_{k}))^{T}{\mathbf{M}}{\mathbf{x}}_{k+1}\\&\qquad \le \beta \Vert {\mathbf{x}}^{*}\Vert _p^p + (\text {Vec} ({\mathbf{Z}}_{k+1}) +\mu \text {Vec}({\widetilde{\mathbf{H}}}_{k+1} -{\widetilde{\mathbf{H}}}_{k})\\&\qquad \quad +\,\mu \text {Vec}({\mathbf{T}}_{k+1} -{\mathbf{T}}_{k}))^{T}{\mathbf{M}}{\mathbf{x}}^{*}\\&(2) \quad \lambda {\Vert {\mathbf{H}} - {\widetilde{\mathbf{H}}}_{k+1} \Vert _{*}} -\text {Tr}(({\mathbf{Z}}_{k+1} \\&\qquad \quad +\, \mu ({\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}))^{T}{\widetilde{\mathbf{H}}}_{k+1}) \\&\qquad \le \lambda {\Vert {\mathbf{H}} - {\widetilde{\mathbf{H}}}^{*} \Vert _{*}} -\text {Tr}(({\mathbf{Z}}_{k+1}\\&\qquad \quad +\, \mu ({\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}))^{T}{\widetilde{\mathbf{H}}}^{*})\\&(3) \quad \Vert {\mathbf{T}}_{k+1}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k+1})^{T}{\mathbf{T}}_{k+1}) \\&\qquad \le \Vert {\mathbf{T}}^{*}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k+1})^{T}{\mathbf{T}}^{*}) \end{aligned} \end{aligned}$$

Adding these three inequalities, with \({{D}}({{\mathbf{x}}^{*}})-{\widetilde{\mathbf{H}}}^{*} - {\mathbf{T}}^{*} = 0\) and \({\mathbf{R}}_{k+1} = {D}({\mathbf{x}}_{k+1}) -{\widetilde{\mathbf{H}}}_{k+1}- {\mathbf{T}}_{k+1}\), and after regrouping, we obtain

$$\begin{aligned} \begin{aligned} f_{k+1} - f^{*}&\le - \text {Vec}^{T} ({\mathbf{Z}}_{k+1})\mathbf{{r}}_{k+1} \\&\quad -\, \mu \text {Vec}^{T} ({\mathbf{T}}_{k+1}- {\mathbf{T}}_{k})(\mathbf{{r}}_{k+1} + \text {Vec} ({\mathbf{T}}_{k+1}- {\mathbf{T}}^{*} ))\\&\quad -\,\mu \text {Vec}^{T}({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}_{k})(\mathbf{{r}}_{k+1}+ \text {Vec} (({\widetilde{\mathbf{H}}}_{k+1}-{\widetilde{\mathbf{H}}}^{*})\\&\quad +\,({\mathbf{T}}_{k+1}- {\mathbf{T}}^{*} )) \end{aligned} \end{aligned}$$
(27)

Adding (26) and (27), and rearranging, we can obtain this inequality:

$$\begin{aligned} \begin{aligned}&\text {Vec}^{T} ({\mathbf{Z}}_{k+1}\\&\quad -\,{\mathbf{Z}}^{*})\mathbf{{r}}_{k+1} +\mu \text {Vec}^{T} ({\mathbf{T}}_{k+1}- {\mathbf{T}}_{k})\mathbf{{r}_{k+1}}\\&\quad +\,\mu \text {Vec}^{T} ({\mathbf{T}}_{k+1}\\&\quad -\, {\mathbf{T}}_{k})\text {Vec} ({\mathbf{T}}_{k+1} - {\mathbf{T}}^{*} )\\&\quad +\, \mu \text {Vec}^{T}({\widetilde{\mathbf{H}}}_{k+1}\\&\quad -\, {\widetilde{\mathbf{H}}}_{k})\text {Vec}({\widetilde{\mathbf{H}}}_{k+1}\\&\quad -\,{\widetilde{\mathbf{H}}}^{*})\\&\quad +\, \mu \text {Vec}^{T}({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}_{k})(\mathbf{{r}}_{k+1}\\&\quad +\, \text {Vec}({\mathbf{T}}_{k+1}- {\mathbf{T}}^{*} ))\\&\quad \le 0 \end{aligned} \end{aligned}$$

Substituting \({\mathbf{T}}_{k+1}-{\mathbf{T}}^{*}={\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}+{\mathbf{T}}_{k}-{\mathbf{T}}^{*}\) in the fourth term and utilizing \(\text {Vec}^{T}(U)\text {Vec}(V)={\text {Tr}}(U^{T}V)\), where \(\forall ~U, V \in {R}^{s \times t}\), the aforementioned inequality can be rewritten as

$$\begin{aligned} \begin{aligned}&-\text {Tr}(({\mathbf{Z}}_{k+1}-{\mathbf{Z}}^{*})^{T}{\mathbf{R}}_{k+1})-\mu \text {Tr} (({\mathbf{T}}_{k+1}- {\mathbf{T}}_{k})^{T}({\mathbf{T}}_{k+1}- {\mathbf{T}}^{*} ))\\&\quad -\, \mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}_{k})^{T}({\widetilde{\mathbf{H}}}_{k+1}\\&\quad -\,{\widetilde{\mathbf{H}}}^{*}))+\mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}_{k})^{T}({\mathbf{T}}^{*}-{\mathbf{T}}_{k}) )\\& \ge \mu \text {Tr} (({\mathbf{T}}_{k+1}- {\mathbf{T}}_{k})^{T}{} \mathbf{{R}_{k+1}})+\mu \text {Tr} (({\widetilde{\mathbf{H}}}_{k+1}\\&\quad -\, {\widetilde{\mathbf{H}}}_{k})^{T}({\mathbf{R}}_{k+1}+({\mathbf{T}}_{k+1}- {\mathbf{T}}_{k})) \end{aligned} \end{aligned}$$
(28)

Subsequently, it can be proved that the two items on the right side in (28) are both greater than 0. To see this, \({\mathbf{T}}_{k+1}\) minimizes \(\Vert {\mathbf{T}}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k+1})^{T}{\mathbf{T}})\), then \({\mathbf{T}}_{k}\) minimizes \(\Vert {\mathbf{T}}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k})^{T}{\mathbf{T}})\), which are equivalent to

$$\begin{aligned} \begin{aligned}&\Vert {\mathbf{T}}_{k+1}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k+1})^{T}{\mathbf{T}}_{k+1})\\&\quad \le \Vert {\mathbf{T}}_{k}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k+1})^{T}{\mathbf{T}}_{k})\\&\quad \Vert {\mathbf{T}}_{k}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k})^{T}{\mathbf{T}}_{k})\\&\quad \le \Vert {\mathbf{T}}_{k+1}\Vert _F^2 - \text {Tr}(({\mathbf{Z}}_{k})^{T}{\mathbf{T}}_{k+1}) \end{aligned} \end{aligned}$$

Adding the two inequalities above, taking advantage of \({{\mathbf{Z}}}_{k+1} ={\mathbf{Z}}_{k} +\mu {\mathbf{R}}_{k+1}\) and reorganizing, we get \(\mu \text {Tr} (({\mathbf{T}}_{k+1}- {\mathbf{T}}_{k})^{T}{} \mathbf{{R}_{k+1}}) \ge 0\). Similarly, using the argument that \({\widetilde{\mathbf{H}}}_{k+1}\) minimizes \(\lambda {\Vert {\mathbf{H}} - {\widetilde{\mathbf{H}}} \Vert _{*}} -\text {Tr}(({\mathbf{Z}}_{k+1} + \mu ({\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}))^{T}{\widetilde{\mathbf{H}}})\), we obtain \(\mu \text {Tr} (({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}_{k})^{T}({\mathbf{R}}_{k+1}+({\mathbf{T}}_{k+1}-{\mathbf{T}}_{k})) \ge 0\).

In addition, since \({\widetilde{\mathbf{H}}}_{k+1}-{\widetilde{\mathbf{H}}}_{k} ={\widetilde{\mathbf{H}}}_{k+1}-{\widetilde{\mathbf{H}}}^{*}+{\widetilde{\mathbf{H}}}^{*}+{\widetilde{\mathbf{H}}}_{k}\), then

$$\begin{aligned} \begin{aligned}&\mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}_{k})^{T}({\mathbf{T}}^{*}-{\mathbf{T}}_{k}) )\\&\quad =\mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}^{*})^{T}({\mathbf{T}}^{*}-{\mathbf{T}}_{k}) )\\&\qquad +\,\mu \text {Tr}(({\widetilde{\mathbf{H}}}^{*}-{\widetilde{\mathbf{H}}}_{k})^{T}({\mathbf{T}}^{*}-{\mathbf{T}}_{k}))\\&\qquad \le \mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k}-{\widetilde{\mathbf{H}}}^{*})^{T}({\mathbf{T}}_{k}\\&\qquad -\,{\mathbf{T}}^{*}))-\mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}^{*})^{T}({\mathbf{T}}_{k+1}-{\mathbf{T}}^{*}) ) \end{aligned} \end{aligned}$$

With the previous step and multiplying through by 2, we can transform (28) into the following form

$$\begin{aligned} \begin{aligned}&-2\text {Tr}(({\mathbf{Z}}_{k+1}-{\mathbf{Z}}^{*})^{T}{\mathbf{R}}_{k+1}) \\&\quad -\,2\mu \text {Tr} (({\mathbf{T}}_{k+1}- {\mathbf{T}}_{k})^{T}({\mathbf{T}}_{k+1}\\&\quad -\, {\mathbf{T}}^{*} ))\\&\quad -\, 2\mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k+1} - {\widetilde{\mathbf{H}}}_{k})^{T}({\widetilde{\mathbf{H}}}_{k+1}\\&\quad -\,{\widetilde{\mathbf{H}}}^{*}))+2\mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k}\\&\quad -\,{\widetilde{\mathbf{H}}}^{*})^{T}({\mathbf{T}}_{k}-{\mathbf{T}}^{*}))\\&\quad -\,2\mu \text {Tr}(({\widetilde{\mathbf{H}}}_{k+1}- {\widetilde{\mathbf{H}}}^{*})^{T} ({\mathbf{T}}_{k+1}-{\mathbf{T}}^{*}) )\\&\ge 0 \end{aligned} \end{aligned}$$
(29)

Using \({{\mathbf{Z}}}_{k+1} ={\mathbf{Z}}_{k} +\mu {\mathbf{R}}_{k+1}\) and perfect square expression, (29) can be rewritten as

$$\begin{aligned} \begin{aligned}&\left[ \frac{1}{\mu }\Vert {\mathbf{Z}}_{k}-{\mathbf{Z}}^{*}\Vert _F^2+\mu \Vert {\widetilde{\mathbf{H}}}_{k}\right. \\&\left. \quad +\,{\mathbf{T}}_{k}-{\widetilde{\mathbf{H}}}^{*}-{\mathbf{T}}^{*}\Vert _F^2\right] \\&\quad -\,\left[ \frac{1}{\mu }\Vert {\mathbf{Z}}_{k+1}-{\mathbf{Z}}^{*}\Vert _F^2\right. \\&\left. \quad +\,\mu \Vert {\widetilde{\mathbf{H}}}_{k+1}+{\mathbf{T}}_{k+1}-{\widetilde{\mathbf{H}}}^{*} -{\mathbf{T}}^{*}\Vert _F^2\right] \\&\qquad \ge \mu \Vert {\mathbf{R}}_{k+1}\Vert _F^2 +\mu \Vert {\widetilde{\mathbf{H}}}_{k+1}-{\widetilde{\mathbf{H}}}_{k}\Vert _F^2 + \mu \Vert {\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}\Vert _F^2 \end{aligned} \end{aligned}$$
(30)

Let \(V_{k} =\frac{1}{\mu }\Vert {\mathbf{Z}}_{k}-{\mathbf{Z}}^{*}\Vert _F^2+\mu \Vert {\widetilde{\mathbf{H}}}_{k}+{\mathbf{T}}_{k}-{\widetilde{\mathbf{H}}}^{*}-{\mathbf{T}}^{*}\Vert _F^2\) and the inequality (30) can be abbreviated as

$$\begin{aligned}&V_{k} -V_{k+1} \ge \mu \Vert {\mathbf{R}}_{k+1}\Vert _F^2 +\mu \Vert {\widetilde{\mathbf{H}}}_{k+1}\nonumber \\&\quad -\,{\widetilde{\mathbf{H}}}_{k}\Vert _F^2 + \mu \Vert {\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}\Vert _F^2 \end{aligned}$$
(31)

This indicates that \(V_{k}\) decreases in each iteration since \(\mu >0\), i.e., \(V_{k+1} \le V_{k}\le V_{0}\). We iterate the inequality (31) and can acquire that

$$\begin{aligned}&\mu \sum \limits _{k=0}^{\infty } (\Vert {\mathbf{R}}_{k+1}\Vert _F^2+\Vert {\widetilde{\mathbf{H}}}_{k+1}-{\widetilde{\mathbf{H}}}_{k}\Vert _F^2\\&\quad +\,\Vert {\mathbf{T}}_{k+1}-{\mathbf{T}}_{k}\Vert _F^2)\le V_{0}, \end{aligned}$$

which suggests that \({\mathbf{R}}_{k}\rightarrow 0\), \({\widetilde{\mathbf{H}}}_{k+1}-{\widetilde{\mathbf{H}}}_{k} \rightarrow 0\) and \({\mathbf{T}}_{k+1}-{\mathbf{T}}_{k} \rightarrow 0\) as \(k \rightarrow \infty\) by the monotone bounded theorem. Thus, the right side in (26) and (27) both go to zero as \(k \rightarrow \infty\). Further, we can get that \(\lim \nolimits _{k \rightarrow \infty } f_{k} =f^{*}\).

That is, \(\lim \nolimits _{k \rightarrow \infty }({\Vert {\mathbf{T}}_{k} \Vert _F^2} + \lambda {\Vert {\mathbf{H}} - {\widetilde{\mathbf{H}}}_{k} \Vert _{*}} +\beta /2\left\| {\mathbf{x}}_{k} \right\| _p^p) = {\Vert {\mathbf{T}}^{*} \Vert _F^2} + \lambda {\Vert {\mathbf{H}} - {\widetilde{\mathbf{H}}}^{*} \Vert _{*}} +\beta /2 \left\| {\mathbf{x}}^{*} \right\| _p^p\), which implies that \(({\mathbf{x}}_{k},{\widetilde{\mathbf{H}}}_{k},{\mathbf{T}}_{k})\) is close to \(({\mathbf{x}}^{*},{\widetilde{\mathbf{H}}}^{*}, {\mathbf{T}}^{*})\) as \(k \rightarrow \infty\). By previous analysis and \({\mathbf{Z}}_{k}\rightarrow {{\mathbf{Z}}}^{*}\), we have \(\lim \nolimits _{k \rightarrow \infty } {V_{k}} =\lim \nolimits _{k \rightarrow \infty }(\frac{1}{\mu }\Vert {\mathbf{Z}}_{k}-{\mathbf{Z}}^{*}\Vert _F^2+\mu \Vert {\widetilde{\mathbf{H}}}_{k}+{\mathbf{T}}_{k}-{\widetilde{\mathbf{H}}}^{*}-{\mathbf{T}}^{*}\Vert _F^2) = 0\). Thus, \({\widetilde{\mathbf{H}}}_{k}+{\mathbf{T}}_{k}-{\widetilde{\mathbf{H}}}^{*}-{\mathbf{T}}^{*} \rightarrow 0\). Evidently, \(\text {Tr}(({\widetilde{\mathbf{H}}}_{k}-{\widetilde{\mathbf{H}}}^{*})^{T}({\mathbf{T}}_{k}-{\mathbf{T}}^{*}) \ge 0\), then

$$\begin{aligned} \begin{aligned}&0 \le \Vert {\widetilde{\mathbf{H}}}_{k}-{\widetilde{\mathbf{H}}}^{*}\Vert _F^2\\&\quad +\,\Vert {\mathbf{T}}_{k}-{\mathbf{T}}^{*}\Vert _F^2\\&\quad \le \Vert {\widetilde{\mathbf{H}}}_{k}-{\widetilde{\mathbf{H}}}^{*}\Vert _F^2\\&\quad +\,\Vert {\mathbf{T}}_{k}-{\mathbf{T}}^{*}\Vert _F^2+2\text {Tr}(({\widetilde{\mathbf{H}}}_{k}\\&\quad -\,{\widetilde{\mathbf{H}}}^{*})^{T}({\mathbf{T}}_{k}-{\mathbf{T}}^{*}) \\&\quad = \Vert {\widetilde{\mathbf{H}}}_{k}+{\mathbf{T}}_{k}-{\widetilde{\mathbf{H}}}^{*}-{\mathbf{T}}^{*}\Vert _F^2 \rightarrow 0. \end{aligned} \end{aligned}$$

Therefore, \({\widetilde{\mathbf{H}}}_{k}\rightarrow {\widetilde{\mathbf{H}}}^{*}\) and \({\mathbf{T}}_{k}\rightarrow {\mathbf{T}}^{*}\) by squeeze rule. In virtue of the formula \(\lim \nolimits _{k \rightarrow \infty } f_{k} =f^{*}\), we can derive \({\mathbf{x}}_{k}\rightarrow {\mathbf{x}}^{*}\). \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sang, X., Xu, Y., Lu, H. et al. Robust mixed-norm constrained regression with application to face recognitions. Neural Comput & Applic 32, 17551–17567 (2020). https://doi.org/10.1007/s00521-020-04925-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04925-4

Keywords

Navigation