Skip to main content
Log in

A robust projection twin support vector machine with a generalized correntropy-based loss

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The projection twin support vector machine (PTSVM) is a potential tool for classification problem. However the loss function of PTSVM is hinge loss function which is a unbounded loss and not robust enough to outliers. In this work, a robust PTSVM (termed RSHPTSVM) is proposed based on rescaled square hinge loss (RSH-loss) to handle classification problem. A close relationship between RSH-loss and correntropy is established theoretically. The RSH-loss can be viewed as a correntropy-induced loss by a reproducing piecewise kernel. As such a correntropy loss, it has vastly different properties from hinge loss such as boundedness, robustness and nonconvexity. Moreover, RSH-loss is with higher order statistical information from samples. However the nonconvexity of RSHPTSVM makes it difficult to optimize, so that an efficient iterative optimization algorithm based on semi-quadratic optimization theory is proposed to solve RSHPTSVM, which can quickly converge to the optimal solution. Furthermore, we extend our RSHPTSVM from binary classification to multi-classification and propose a robust projection multi-birth support vector machine model (termed RSHPMBSVM). The proposed method is implemented on various datasets including three artificial datasets, UCI datasets, and a practical application dataset. The experiment results under no noise and label noise circumstance confirm the feasibility and effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Vapnik VN (2000) The nature of statistical learning theory. Stat Eng Info Scie, 119–166

  2. Deng N, Tian Y, Zhang C (2012) Support vector machines: optimization based theory, algorithms, and extensions, 41–63

  3. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(1):121–167

    Article  Google Scholar 

  4. Yin H, Jiao X, Chai Y, Fang B (2015) Scene classification based on single-layer SAE and SVM. Expert Syst Appl 42:3368–3380

    Article  Google Scholar 

  5. Bashbaghi S, Granger E, Sabourin R, Bilodeau G (2017) Dynamic ensembles of exemplar-SVMs for still-to-video face recognition. Pattern Recognit 69:61–81

    Article  Google Scholar 

  6. Ma S, Cheng B, Shang Z, Liu G (2018) Scattering transform and LSPTSVM based fault diagnosis of rotating machinery. Mech Syst Signal Process 104:155–170

    Article  Google Scholar 

  7. Suykens J, Vandewalle J (2004) Least squares support vector machine classifiers. Neural Process Lett 9:293–300

    Article  Google Scholar 

  8. Mangasarian O, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal MachIntell 28:69–74

    Article  Google Scholar 

  9. Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29:905–910

    Article  Google Scholar 

  10. Ye Q, Zhao C, Ye N, Chen Y (2010) Multi-weight vector projection support vector machines. Pattern Recognit Lett 31:2006–2011

    Article  Google Scholar 

  11. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7:179–188

    Google Scholar 

  12. Ye Q, Ye N, Yin T (2014) Enhanced multi-weight vector projection support vector machine. Pattern Recognit Lett 42:91–100

    Article  Google Scholar 

  13. Chen X, Yang J, Ye Q, Liang J (2011) Recursive projection twin support vector machine via within-class variance minimization. Pattern Recognit 44:2643–2655

    Article  Google Scholar 

  14. Shao Y, Wang Z, Chen W, Deng N (2013) A regularization for the projection twin support vector machine. Knowl Based Syst 37:203–210

    Article  Google Scholar 

  15. Li C, Huang Y, Wu H, Shao Y, Yang Z (2016) Multiple recursive projection twin support vector machine for multi-class classification. Int J Mach Learn Cybern 7:729–740

    Article  Google Scholar 

  16. Wen Y, Ma J, Yuan C, Yang L (2020) Projection multi-birth support vector machinea for multi-classification. Appl Intell 50(13):1–17

    Google Scholar 

  17. Ma J, Yang L, Sun Q (2020) Capped L1-norm distance metric-based fast robust twin bounded support vector machine. Neurocomputing 412:295–311

    Article  Google Scholar 

  18. Li C, Shao Y, Deng N (2015) Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw 65:92–104

    Article  Google Scholar 

  19. Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by L1-norm twin-projection support vector machine. Neurocomputing 223:1–11

    Article  Google Scholar 

  20. Chen W, Li C, Shao Y, Zhang J, Deng N (2018) Robust L1-norm multi-weight vector projection support vector machine with efficient algorithm. Neurocomputing 315:345–361

    Article  Google Scholar 

  21. Liu W, Pokharel PP, Prícipe J (2007) Correntropy: properties and applications in Non-Gaussian signal processing. IEEE Trans Signal Process 55:5286–5298

    Article  MathSciNet  Google Scholar 

  22. Yang L, Ding G, Yuan C, Zhang M (2020) Robust regression framework with asymmetrically analogous to correntropy-induced loss. Knowl Based Syst 191:105211

    Article  Google Scholar 

  23. Singh A, Pokharel R, Prícipe J (2014) The C-loss function for pattern classification. Pattern Recognit 47:441–453

    Article  Google Scholar 

  24. Xu G, Hu B, Prícipe J (2018) Robust C-loss kernel classifiers. IEEE Trans Neural Netw Learn Syst 29:510–522

    Article  MathSciNet  Google Scholar 

  25. Ren Z, Yang L (2018) Correntropy-based robust extreme learning machine for classification. Neurocomputing 313:74–84

    Article  Google Scholar 

  26. Boyd SP, Vandenberghe L (2006) Convex optimization. IEEE Trans Autom Control 51:1859–1859

    Article  Google Scholar 

  27. Geng F, Qian S (2014) Piecewise reproducing kernel method for singularly perturbed delay initial value problems. Appl Math Lett 37:67–71

    Article  MathSciNet  Google Scholar 

  28. Blake C (1998) UCI Repository of machine learning databases

  29. AA, Rice JA (1995) Mathematical statistics and data analysis. J Am Stat Assoc 90(429):398

    Google Scholar 

  30. Shi B, Liu J (2018) Nonlinear metric learning for kNN and SVMs through geometric transformations. Neurocomputing 318:18–29

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China (No.11471010 and No.11271367).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liming Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.s

Appendix A:: Semi-quadratic optimization

Appendix A:: Semi-quadratic optimization

The semi-quadratic optimization technique is an optimization method based on the conjugate function theory, which can solve the non-convex optimization problem. The main idea is to convert the original non-convex objective function into a semi-quadratic augmented objective function by introducing an auxiliary variable. Finally, the algorithm of alternating iteration is used to obtain the approximate solution to achieve the final solution of the original problem.

Definition 1

When a function F(α, β) satisfies the following conditions, we call it a semiquadratic function. That is, function F(α, β) is a convex function with respect to β when α is fixed, and F(α, β) is a quadratic function with respect to α when β is fixed.

Solution of semiquadratic function

For the following type of minimum problem:

$$ \underset{\alpha}{\min} J(\alpha)+L(\alpha) $$
(A.1)

where α = [α1, α2, … , αN]TRN, J(α) represents a convex penalty function, L(α) represents the loss function and satisfies \(L(\alpha ) = {\sum }_{i=1}^{N}L(\alpha _{i})\).

For the loss function L(α), fixed the original α, we use the semi-quadratic optimization technique to introduce a new auxiliary variable β = [β1, β2, … , βN]TRN so that the following expression holds:

$$ L(\alpha_{i}) = \underset{\beta_{i}}{\min}{F(\alpha_{i},\beta_{i})+G(\beta_{i})},i =1,2,\ldots,N $$
(A.2)

where F(αi, βi) is a semi-quadratic function.

With expression (A.2), formulation (A.1) can be written as follows:

$$ \underset{\alpha,\beta}{\min} J(\alpha)+F(\alpha,\beta)+G(\beta) $$
(A.3)

where \(F(\alpha ,\beta )={\sum }_{i=1}^{N}F(\alpha _{i},\beta _{i})\), \(G(\beta )={\sum }_{i=1}^{N}G(\beta _{i})\). This is a semi-quadratic optimization problem. We call the objected function in (A.3) as an augmented objective function. The specific derivation process is as follows:

When the following theorem holds, the semi-quadratic function F(α, β) in the augmented objective function can be written as \(F(\alpha ,\beta )={\sum }_{i=1}^{N}F_{M}(\alpha _{i},\beta _{i})\), where \(F_{M}(\alpha _{i},\beta _{i})=\frac {1}{2}\alpha _{i}{\beta _{i}^{2}}\).

Theorem 1

If the loss function L(.) satisfies the following conditions:

  1. (1)

    L(α) ≥ 0, and L(0) = 0;

  2. (2)

    L(α) = L(−α),∀αR;

  3. (3)

    \(L^{\prime }(\alpha ) \ge 0,\forall \alpha \ge 0\);

  4. (4)

    \(\forall \alpha \in R^{+},\exists L^{\prime \prime }(\alpha ),\) and \(L^{\prime \prime }(0^{+})\ge 0\);

  5. (5)

    \(L(\sqrt {\alpha })\) is the concave function on R+; then there must be a convex function G(α) that satisfies the following conditions:

    $$ L(\alpha) = \inf_{\beta > 0} \left\{\frac{1}{2}\beta \alpha^{2}+G(\beta)\right\} $$
    (A.4)

    when α is taken, there is a minimum solution β ≥ 0 on the right side of the above equation, which satisfies the following conditions:

    $$ \inf_{\beta>0}\left\{ \frac{1}{2}\beta \alpha^{2} + G(\beta) \right\} = \frac{1}{2}\beta^{*}\alpha^{2} + G(\beta^{*}) $$
    (A.5)

    where the specific expression of β is as follows:

    $$ \beta^{*}=\left\{ \begin{array}{lll} \frac{L^{\prime}(\alpha)}{\alpha} \qquad& \alpha > 0 \\ L^{\prime\prime}(0^{+}) \qquad & \alpha = 0 \\ \frac{L^{\prime}(-\alpha)}{-\alpha} \qquad & \alpha <0 \end{array} \right. $$
    (A.6)

    The proof of the above theorem can be found in [25].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, Q., Yang, L. A robust projection twin support vector machine with a generalized correntropy-based loss. Appl Intell 52, 2154–2170 (2022). https://doi.org/10.1007/s10489-021-02480-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02480-6

Keywords

Navigation