Indefinite twin support vector machine with DC functions programming
Introduction
Support vector machine (SVM) [1], [2], [3], [4] is a machine learning method based on the theory of statistical learning and the principle of structural risk minimization (reducing the VC dimension of learning machine and seeking the minimum sum of experience risk and confidence risk). The learning strategy of SVM is “maximum margin”, that is, solving the optimal separating hyperplane with the maximal margin, which gives impetus to have good generalization. In fact, SVM aims to address a constrained quadratic programming (QP) problem. By introducing kernel learning, the samples in low dimension feature space can be implicitly mapped into the high dimensional feature space and the complexity of inner product operations in SVM can be avoided [5]. Therefore, it overcomes the problems of the “curse of dimensionality” and “over-fitting” to a great extent. Since SVM was proposed, it has attracted extensive attention for its superior performance [6], [7], [8] and has been widely used in anomaly detection [9], image retrieval [10], sequence-based prediction of protein [11], etc.
Jayadeva et al. proposed a twin support vector machine (TWSVM) as a useful extension of the traditional SVM. TWSVM generates two nonparallel hyperplanes by solving a pair of smaller-sized QP problems instead of a single larger-sized QP problem [12]. Therefore, compared with SVM, TWSVM accelerates the learning speed for the smaller-sized model and is more resilient to “Cross Planes” datasets for the solution of two nonparallel hyperplanes. However, TWSVM only takes into account the empirical risk minimization principle and lacks structural risk minimization principle which is a significant advantage of SVM. Some scholars solve the problem by modifying the loss function to ensure the structural risk minimization principle and improve the generalization performance [13], [14]. However, in order to ensure the convexity of the modified TWSVM to reduce the dual gap and satisfy Mercer’s condition, the kernel in TWSVM is limited to positive semi-definite (PSD) kernels. In fact, verifying the property of PSD for a given kernel can be a challenging task beyond the ability of most scholars. Moreover, indefinite kernels (i.e. kernel matrix contains a mix of positive and negative eigenvalues) play an important role in machine learning and real-world applications [15]. Some functions such as hyperbolic tangent kernel are indefinite [16] and most kernels as similarity measures directly utilized in real-world applications are indefinite [17]. Unfortunately, to the best of our knowledge, TWSVM has not exploited the study of indefinite kernels and cannot elegantly deal with indefinite kernels.
However, indefinite kernel SVM (IKSVM) has been studied extensively and many algorithms have been proposed for dealing with indefinite kernels in SVMs. One direction is “Kernel transformation” which applies direct spectral transformations to indefinite kernels. These methods are represented by “Clip” (set all negative eigenvalues to zero) [18], “Flip” (set negative eigenvalues to their absolute value) [19] and “Shift” (add all eigenvalues with a positive constant to make sure all eigenvalues are non-negative after shifting) [20]. The other direction is “Reformulate problems” which is solving the non-convex problem directly. However, these methods may lose useful information in samples and have adverse effects on modeling a function [21], [22]. In 2017, Xu et al. [23] directly focus on the non-convex primal form of IKSVM by decomposing the primal problem into two convex functions.
In this paper, we construct a bridge between TWSVM and indefinite kernel and propose a novel algorithm called indefinite twin support vector machine with difference of convex functions programming (ITWSVM-DC). In order to consider the confidence interval which is ignored by TWSVM and be free from complex matric inversion, we add a regularized item into TWSVM. We further introduce the smooth quadratic hinge loss function to make the regularized TWSVM (ITWSVM) model continuously differentiable and more resilient to indefinite kernels. Then, we analyze the convexity of the proposed ITWSVM. In order to solve the non-convex problem existing in indefinite kernels, DC algorithm [24] is used to decompose the objective function into the subtraction of two convex functions on ITWSVM. Therefore, ITWSVM can both use PSD and indefinite kernels. A line search along the descent direction under the Armijo type rule is used in the DC algorithm to accelerate the convergence rate. We also implement a theoretical analysis to illustrate that ITWSVM-DC can converge to the local optimum and various experiments on both PSD and indefinite kernels show that our algorithm is superior to the state-of-the-art algorithms.
This paper is organized as follows. Section 2 outlines the related works including TWSVM and DC programming. Section 3 expounds the mechanisms of the ITWSVM-DC in detail including the model and convexity of ITWSVM with Representer Theorem, the decomposition of the ITWSVM with DC, the convergence of ITWSVM-DC. Section 4 is the experimental results and analysis. The superiority and convergence of ITWSVM-DC are verified through experiments on real-world and artificial datasets. Conclusions are given in the last section.
Section snippets
TWSVM
For a binary classification problem, given a training set where and . is the number of training samples and is the dimension of training samples. There are samples belonging to class and samples belonging to class in the -dimensional real space . For the linear separable binary classification problem, the goal of TWSVM is to find two non-parallel hyperplanes
The model of TWSVM makes each hyperplane closer to the pattern
The model of the regularized TWSVM
In this section, we introduce a regularization item to TWSVM to make sure that the model is structural risk minimization. We modify the QP problems (4) and (5) with an additional “margin” between the proximal hyperplanes () to ensure hyperplane of one class as far as possible away from the other class. In order to make the regularized TWSVM (ITWSVM) continuously differentiable and more resilient to indefinite kernels, we introduce the smooth quadratic hinge loss function to our
Experiments results and analysis
In this section, all algorithms are implemented in Python 3.6.5 on a PC with an Intel i5-8300H quad core processor, 8 GB RAM and Microsoft Windows 10.
Conclusions
In this paper, we propose a new algorithm named indefinite twin support vector machine with difference of convex functions programming (ITWSVM-DC) which is the first time to employ indefinite kernel to TWSVM. We directly focus on the primal problem of TWSVM instead of the dual form of TWSVM to avoid the existence of dual gap and the loss caused by dual form. By modifying the objective function, a new regularized TWSVM (ITWSVM) comes into being which can improve the generalization of TWSVM. By
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by National Natural Science Foundation of China (Grant No.62076062) and National Key R&D Program of China (Grant No.2017YFB1002801). Furthermore, the work was also supported by Collaborative Innovation Center of Wireless Communications Technology.
Yuexuan An, received her B.Sc in computer science and technology from Jiangsu Normal University in 2015 and M.Sc. degree in computer application technology in China University of Mining and Technology in 2019. She is currently pursuing the Ph.D. degree in the School of computer science and engineering, Southeast University. Her research interest includes machine learning, pattern recognition, SVM, kernel function and various applications.
References (38)
- et al.
Faster SVM training via conjugate SMO
Pattern Recognit.
(2021) - et al.
Improved support vector machine algorithm for heterogeneous data.
Pattern Recognit.
(2015) - et al.
A coordinate descent margin based-twin support vector machine for classification
Neural Netw.
(2012) - et al.
Support vector networks
Mach. Learn.
(1995) The Nature of Statistical Learning Theory
(1996)- et al.
Multi-objective adaptive differential evolution for SVM/SVR hyperparameters selection
Pattern Recognit.
(2021) - et al.
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
(2000) - et al.
Modeling the parameter interactions in ranking SVM with low-rank approximation
IEEE Trans. Knowl. Data Eng.
(2019) - et al.
Adversarial support vector machine learning
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2012) - et al.
Distributed online one-class support vector machine for anomaly detection over networks.
IEEE Trans. Cybern.
(2019)
Scene analysis and search using local features and support vector machine for effective content-based image retrieval
Artif. Intell. Rev.
Sequence-based prediction of protein-peptide binding sites using support vector machine
J. Comput. Chem.
Twin support vector machines for pattern classification
IEEE Trans. Pattern Anal. Mach Intell.
Improvements on twin support vector machines
IEEE Trans. Neural Netw.
Nonparallel support vector machines for pattern classification
IEEE Trans. Cybern.
Learning SVM in Kreïn spaces
IEEE Trans. Pattern Anal. Mach Intell.
Regularization with dot-product kernels
Proceedings of Advances in Neural Information Processing Systems
Learning kernels from indefinite similarities
Proceedings of 26th International Conference On Machine Learning
A generalized kernel approach to dissimilarity-based classification
J. Mach. Learn. Res.
Cited by (14)
Multi-view universum support vector machines with insensitive pinball loss
2024, Expert Systems with ApplicationsLinearized alternating direction method of multipliers for elastic-net support vector machines
2024, Pattern RecognitionPrimal dual algorithm for solving the nonsmooth Twin SVM
2024, Engineering Applications of Artificial IntelligenceA novel discrete learning-based intelligent methodology for breast cancer classification purposes
2023, Artificial Intelligence in MedicineLightweight actor-critic generative adversarial networks for real-time smart generation control of microgrids
2022, Applied EnergyCitation Excerpt :Compared with the common lightweight methods, such as with group Lasso regularization [39], the nonconvex group MCP regularization has stronger sparsity and unbiasedness with better lightweight results [40]. Besides, the difference with convex (DC) decomposition [41] is applied to convert the nonconvex function into a convex function for optimizing the nonconvex group MCP regularization. After pre-training and online training, the LAC-GANs can obtain the optimal control strategy with less computing time and storage resources in a random environment.
State-of-the-art review on advancements of data mining in structural health monitoring
2022, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :Based on reports [135,136], SVM is a new machine learning algorithm with suitable accuracy and beneficial generalization capacity. This technique aims to split the data into subsets into various areas, e.g. pattern recognition and data classification [137-139]. The theory of SVM is described as follows.
Yuexuan An, received her B.Sc in computer science and technology from Jiangsu Normal University in 2015 and M.Sc. degree in computer application technology in China University of Mining and Technology in 2019. She is currently pursuing the Ph.D. degree in the School of computer science and engineering, Southeast University. Her research interest includes machine learning, pattern recognition, SVM, kernel function and various applications.
Hui Xue, received her B.Sc in mathematics from Nanjing Normal University in 2002, and M.Sc. in mathematics from Nanjing University of Aeronautics & Astronautics (NUAA) in 2005. In 2008, she received her Ph.D. degree in computer science from NUAA. She is a professor at the PALM Group, School of Computer Science and Engineering, Southeast University.