Abstract
The projection twin support vector machine (PTSVM) is a potential tool for classification problem. However the loss function of PTSVM is hinge loss function which is a unbounded loss and not robust enough to outliers. In this work, a robust PTSVM (termed RSHPTSVM) is proposed based on rescaled square hinge loss (RSH-loss) to handle classification problem. A close relationship between RSH-loss and correntropy is established theoretically. The RSH-loss can be viewed as a correntropy-induced loss by a reproducing piecewise kernel. As such a correntropy loss, it has vastly different properties from hinge loss such as boundedness, robustness and nonconvexity. Moreover, RSH-loss is with higher order statistical information from samples. However the nonconvexity of RSHPTSVM makes it difficult to optimize, so that an efficient iterative optimization algorithm based on semi-quadratic optimization theory is proposed to solve RSHPTSVM, which can quickly converge to the optimal solution. Furthermore, we extend our RSHPTSVM from binary classification to multi-classification and propose a robust projection multi-birth support vector machine model (termed RSHPMBSVM). The proposed method is implemented on various datasets including three artificial datasets, UCI datasets, and a practical application dataset. The experiment results under no noise and label noise circumstance confirm the feasibility and effectiveness of the proposed methods.
Similar content being viewed by others
References
Vapnik VN (2000) The nature of statistical learning theory. Stat Eng Info Scie, 119–166
Deng N, Tian Y, Zhang C (2012) Support vector machines: optimization based theory, algorithms, and extensions, 41–63
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(1):121–167
Yin H, Jiao X, Chai Y, Fang B (2015) Scene classification based on single-layer SAE and SVM. Expert Syst Appl 42:3368–3380
Bashbaghi S, Granger E, Sabourin R, Bilodeau G (2017) Dynamic ensembles of exemplar-SVMs for still-to-video face recognition. Pattern Recognit 69:61–81
Ma S, Cheng B, Shang Z, Liu G (2018) Scattering transform and LSPTSVM based fault diagnosis of rotating machinery. Mech Syst Signal Process 104:155–170
Suykens J, Vandewalle J (2004) Least squares support vector machine classifiers. Neural Process Lett 9:293–300
Mangasarian O, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal MachIntell 28:69–74
Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29:905–910
Ye Q, Zhao C, Ye N, Chen Y (2010) Multi-weight vector projection support vector machines. Pattern Recognit Lett 31:2006–2011
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7:179–188
Ye Q, Ye N, Yin T (2014) Enhanced multi-weight vector projection support vector machine. Pattern Recognit Lett 42:91–100
Chen X, Yang J, Ye Q, Liang J (2011) Recursive projection twin support vector machine via within-class variance minimization. Pattern Recognit 44:2643–2655
Shao Y, Wang Z, Chen W, Deng N (2013) A regularization for the projection twin support vector machine. Knowl Based Syst 37:203–210
Li C, Huang Y, Wu H, Shao Y, Yang Z (2016) Multiple recursive projection twin support vector machine for multi-class classification. Int J Mach Learn Cybern 7:729–740
Wen Y, Ma J, Yuan C, Yang L (2020) Projection multi-birth support vector machinea for multi-classification. Appl Intell 50(13):1–17
Ma J, Yang L, Sun Q (2020) Capped L1-norm distance metric-based fast robust twin bounded support vector machine. Neurocomputing 412:295–311
Li C, Shao Y, Deng N (2015) Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw 65:92–104
Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by L1-norm twin-projection support vector machine. Neurocomputing 223:1–11
Chen W, Li C, Shao Y, Zhang J, Deng N (2018) Robust L1-norm multi-weight vector projection support vector machine with efficient algorithm. Neurocomputing 315:345–361
Liu W, Pokharel PP, Prícipe J (2007) Correntropy: properties and applications in Non-Gaussian signal processing. IEEE Trans Signal Process 55:5286–5298
Yang L, Ding G, Yuan C, Zhang M (2020) Robust regression framework with asymmetrically analogous to correntropy-induced loss. Knowl Based Syst 191:105211
Singh A, Pokharel R, Prícipe J (2014) The C-loss function for pattern classification. Pattern Recognit 47:441–453
Xu G, Hu B, Prícipe J (2018) Robust C-loss kernel classifiers. IEEE Trans Neural Netw Learn Syst 29:510–522
Ren Z, Yang L (2018) Correntropy-based robust extreme learning machine for classification. Neurocomputing 313:74–84
Boyd SP, Vandenberghe L (2006) Convex optimization. IEEE Trans Autom Control 51:1859–1859
Geng F, Qian S (2014) Piecewise reproducing kernel method for singularly perturbed delay initial value problems. Appl Math Lett 37:67–71
Blake C (1998) UCI Repository of machine learning databases
AA, Rice JA (1995) Mathematical statistics and data analysis. J Am Stat Assoc 90(429):398
Shi B, Liu J (2018) Nonlinear metric learning for kNN and SVMs through geometric transformations. Neurocomputing 318:18–29
Acknowledgments
This work is supported by the National Nature Science Foundation of China (No.11471010 and No.11271367).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.s
Appendix A:: Semi-quadratic optimization
Appendix A:: Semi-quadratic optimization
The semi-quadratic optimization technique is an optimization method based on the conjugate function theory, which can solve the non-convex optimization problem. The main idea is to convert the original non-convex objective function into a semi-quadratic augmented objective function by introducing an auxiliary variable. Finally, the algorithm of alternating iteration is used to obtain the approximate solution to achieve the final solution of the original problem.
Definition 1
When a function F(α, β) satisfies the following conditions, we call it a semiquadratic function. That is, function F(α, β) is a convex function with respect to β when α is fixed, and F(α, β) is a quadratic function with respect to α when β is fixed.
Solution of semiquadratic function
For the following type of minimum problem:
where α = [α1, α2, … , αN]T ∈ RN, J(α) represents a convex penalty function, L(α) represents the loss function and satisfies \(L(\alpha ) = {\sum }_{i=1}^{N}L(\alpha _{i})\).
For the loss function L(α), fixed the original α, we use the semi-quadratic optimization technique to introduce a new auxiliary variable β = [β1, β2, … , βN]T ∈ RN so that the following expression holds:
where F(αi, βi) is a semi-quadratic function.
With expression (A.2), formulation (A.1) can be written as follows:
where \(F(\alpha ,\beta )={\sum }_{i=1}^{N}F(\alpha _{i},\beta _{i})\), \(G(\beta )={\sum }_{i=1}^{N}G(\beta _{i})\). This is a semi-quadratic optimization problem. We call the objected function in (A.3) as an augmented objective function. The specific derivation process is as follows:
When the following theorem holds, the semi-quadratic function F(α, β) in the augmented objective function can be written as \(F(\alpha ,\beta )={\sum }_{i=1}^{N}F_{M}(\alpha _{i},\beta _{i})\), where \(F_{M}(\alpha _{i},\beta _{i})=\frac {1}{2}\alpha _{i}{\beta _{i}^{2}}\).
Theorem 1
If the loss function L(.) satisfies the following conditions:
-
(1)
L(α) ≥ 0, and L(0) = 0;
-
(2)
L(α) = L(−α),∀α ∈ R;
-
(3)
\(L^{\prime }(\alpha ) \ge 0,\forall \alpha \ge 0\);
-
(4)
\(\forall \alpha \in R^{+},\exists L^{\prime \prime }(\alpha ),\) and \(L^{\prime \prime }(0^{+})\ge 0\);
-
(5)
\(L(\sqrt {\alpha })\) is the concave function on R+; then there must be a convex function G(α) that satisfies the following conditions:
$$ L(\alpha) = \inf_{\beta > 0} \left\{\frac{1}{2}\beta \alpha^{2}+G(\beta)\right\} $$(A.4)when α is taken, there is a minimum solution β∗ ≥ 0 on the right side of the above equation, which satisfies the following conditions:
$$ \inf_{\beta>0}\left\{ \frac{1}{2}\beta \alpha^{2} + G(\beta) \right\} = \frac{1}{2}\beta^{*}\alpha^{2} + G(\beta^{*}) $$(A.5)where the specific expression of β is as follows:
$$ \beta^{*}=\left\{ \begin{array}{lll} \frac{L^{\prime}(\alpha)}{\alpha} \qquad& \alpha > 0 \\ L^{\prime\prime}(0^{+}) \qquad & \alpha = 0 \\ \frac{L^{\prime}(-\alpha)}{-\alpha} \qquad & \alpha <0 \end{array} \right. $$(A.6)The proof of the above theorem can be found in [25].
Rights and permissions
About this article
Cite this article
Ren, Q., Yang, L. A robust projection twin support vector machine with a generalized correntropy-based loss. Appl Intell 52, 2154–2170 (2022). https://doi.org/10.1007/s10489-021-02480-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02480-6