Elsevier

Pattern Recognition

Volume 121, January 2022, 108195
Pattern Recognition

Indefinite twin support vector machine with DC functions programming

https://doi.org/10.1016/j.patcog.2021.108195Get rights and content

Highlights

  • We propose a novel regularized TWSVM called ITWSVM for indefinite kernel.

  • We introduce a smooth quadratic hinge loss function and a maximum margin regularization term to ITWSVM.

  • We theoretically analyze the nature of ITWSVM and expand ITWSVM to indefinite kernels and multi-class classification.

  • We introduce difference of convex functions (DC) to solve the non-convex problem in indefinite kernel settings and further propose ITWSVM-DC.

  • Extensive experiments demonstrate that ITWSVM-DC is a robust and prominent algorithm and can perform excellently in different situations.

Abstract

Twin support vector machine (TWSVM) is an efficient algorithm for binary classification. However, the lack of the structural risk minimization principle restrains the generalization of TWSVM and the guarantee of convex optimization constraints TWSVM to only use positive semi-definite kernels (PSD). In this paper, we propose a novel TWSVM for indefinite kernel called indefinite twin support vector machine with difference of convex functions programming (ITWSVM-DC). The indefinite TWSVM (ITWSVM) leverages a maximum margin regularization term to improve the generalization of TWSVM and a smooth quadratic hinge loss function to make the model continuously differentiable. The representer theorem is applied to the ITWSVM and the convexity of the ITWSVM is analyzed. In order to address the non-convex optimization problem when the kernel is indefinite, a difference of convex functions (DC) is used to decompose the non-convex objective function into the subtraction of two convex functions and a line search method is applied in the DC algorithm to accelerate the convergence rate. A theoretical analysis illustrates that ITWSVM-DC can converge to a local optimum and extensive experiments on indefinite and positive semi-definite kernels show the superiority of ITWSVM-DC.

Introduction

Support vector machine (SVM) [1], [2], [3], [4] is a machine learning method based on the theory of statistical learning and the principle of structural risk minimization (reducing the VC dimension of learning machine and seeking the minimum sum of experience risk and confidence risk). The learning strategy of SVM is “maximum margin”, that is, solving the optimal separating hyperplane with the maximal margin, which gives impetus to have good generalization. In fact, SVM aims to address a constrained quadratic programming (QP) problem. By introducing kernel learning, the samples in low dimension feature space can be implicitly mapped into the high dimensional feature space and the complexity of inner product operations in SVM can be avoided [5]. Therefore, it overcomes the problems of the “curse of dimensionality” and “over-fitting” to a great extent. Since SVM was proposed, it has attracted extensive attention for its superior performance [6], [7], [8] and has been widely used in anomaly detection [9], image retrieval [10], sequence-based prediction of protein [11], etc.

Jayadeva et al. proposed a twin support vector machine (TWSVM) as a useful extension of the traditional SVM. TWSVM generates two nonparallel hyperplanes by solving a pair of smaller-sized QP problems instead of a single larger-sized QP problem [12]. Therefore, compared with SVM, TWSVM accelerates the learning speed for the smaller-sized model and is more resilient to “Cross Planes” datasets for the solution of two nonparallel hyperplanes. However, TWSVM only takes into account the empirical risk minimization principle and lacks structural risk minimization principle which is a significant advantage of SVM. Some scholars solve the problem by modifying the loss function to ensure the structural risk minimization principle and improve the generalization performance [13], [14]. However, in order to ensure the convexity of the modified TWSVM to reduce the dual gap and satisfy Mercer’s condition, the kernel in TWSVM is limited to positive semi-definite (PSD) kernels. In fact, verifying the property of PSD for a given kernel can be a challenging task beyond the ability of most scholars. Moreover, indefinite kernels (i.e. kernel matrix contains a mix of positive and negative eigenvalues) play an important role in machine learning and real-world applications [15]. Some functions such as hyperbolic tangent kernel are indefinite [16] and most kernels as similarity measures directly utilized in real-world applications are indefinite [17]. Unfortunately, to the best of our knowledge, TWSVM has not exploited the study of indefinite kernels and cannot elegantly deal with indefinite kernels.

However, indefinite kernel SVM (IKSVM) has been studied extensively and many algorithms have been proposed for dealing with indefinite kernels in SVMs. One direction is “Kernel transformation” which applies direct spectral transformations to indefinite kernels. These methods are represented by “Clip” (set all negative eigenvalues to zero) [18], “Flip” (set negative eigenvalues to their absolute value) [19] and “Shift” (add all eigenvalues with a positive constant to make sure all eigenvalues are non-negative after shifting) [20]. The other direction is “Reformulate problems” which is solving the non-convex problem directly. However, these methods may lose useful information in samples and have adverse effects on modeling a function [21], [22]. In 2017, Xu et al. [23] directly focus on the non-convex primal form of IKSVM by decomposing the primal problem into two convex functions.

In this paper, we construct a bridge between TWSVM and indefinite kernel and propose a novel algorithm called indefinite twin support vector machine with difference of convex functions programming (ITWSVM-DC). In order to consider the confidence interval which is ignored by TWSVM and be free from complex matric inversion, we add a regularized item into TWSVM. We further introduce the smooth quadratic hinge loss function to make the regularized TWSVM (ITWSVM) model continuously differentiable and more resilient to indefinite kernels. Then, we analyze the convexity of the proposed ITWSVM. In order to solve the non-convex problem existing in indefinite kernels, DC algorithm [24] is used to decompose the objective function into the subtraction of two convex functions on ITWSVM. Therefore, ITWSVM can both use PSD and indefinite kernels. A line search along the descent direction under the Armijo type rule is used in the DC algorithm to accelerate the convergence rate. We also implement a theoretical analysis to illustrate that ITWSVM-DC can converge to the local optimum and various experiments on both PSD and indefinite kernels show that our algorithm is superior to the state-of-the-art algorithms.

This paper is organized as follows. Section 2 outlines the related works including TWSVM and DC programming. Section 3 expounds the mechanisms of the ITWSVM-DC in detail including the model and convexity of ITWSVM with Representer Theorem, the decomposition of the ITWSVM with DC, the convergence of ITWSVM-DC. Section 4 is the experimental results and analysis. The superiority and convergence of ITWSVM-DC are verified through experiments on real-world and artificial datasets. Conclusions are given in the last section.

Section snippets

TWSVM

For a binary classification problem, given a training set (xi,yi),i=1,2,,n where xiX and yi{1,+1}. n is the number of training samples and m is the dimension of training samples. There are n1 samples belonging to class +1 and n2 samples belonging to class 1 in the n-dimensional real space X. For the linear separable binary classification problem, the goal of TWSVM is to find two non-parallel hyperplanesx1Tw1+b1=0andx2Tw2+b2=0.

The model of TWSVM makes each hyperplane closer to the pattern

The model of the regularized TWSVM

In this section, we introduce a regularization item to TWSVM to make sure that the model is structural risk minimization. We modify the QP problems (4) and (5) with an additional “margin” between the proximal hyperplanes (xTwi+bi=0(i=1,2)) to ensure hyperplane of one class as far as possible away from the other class. In order to make the regularized TWSVM (ITWSVM) continuously differentiable and more resilient to indefinite kernels, we introduce the smooth quadratic hinge loss function to our

Experiments results and analysis

In this section, all algorithms are implemented in Python 3.6.5 on a PC with an Intel i5-8300H quad core processor, 8 GB RAM and Microsoft Windows 10.

Conclusions

In this paper, we propose a new algorithm named indefinite twin support vector machine with difference of convex functions programming (ITWSVM-DC) which is the first time to employ indefinite kernel to TWSVM. We directly focus on the primal problem of TWSVM instead of the dual form of TWSVM to avoid the existence of dual gap and the loss caused by dual form. By modifying the objective function, a new regularized TWSVM (ITWSVM) comes into being which can improve the generalization of TWSVM. By

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant No.62076062) and National Key R&D Program of China (Grant No.2017YFB1002801). Furthermore, the work was also supported by Collaborative Innovation Center of Wireless Communications Technology.

Yuexuan An, received her B.Sc in computer science and technology from Jiangsu Normal University in 2015 and M.Sc. degree in computer application technology in China University of Mining and Technology in 2019. She is currently pursuing the Ph.D. degree in the School of computer science and engineering, Southeast University. Her research interest includes machine learning, pattern recognition, SVM, kernel function and various applications.

References (38)

  • A. Torres-Barrán et al.

    Faster SVM training via conjugate SMO

    Pattern Recognit.

    (2021)
  • S.L. Peng et al.

    Improved support vector machine algorithm for heterogeneous data.

    Pattern Recognit.

    (2015)
  • Y.H. Shao et al.

    A coordinate descent margin based-twin support vector machine for classification

    Neural Netw.

    (2012)
  • C. Cores et al.

    Support vector networks

    Mach. Learn.

    (1995)
  • V.N. Vapnik

    The Nature of Statistical Learning Theory

    (1996)
  • C. da Silva Santos et al.

    Multi-objective adaptive differential evolution for SVM/SVR hyperparameters selection

    Pattern Recognit.

    (2021)
  • N. Cristianini et al.

    An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

    (2000)
  • J. Xu et al.

    Modeling the parameter interactions in ranking SVM with low-rank approximation

    IEEE Trans. Knowl. Data Eng.

    (2019)
  • Y. Zhou et al.

    Adversarial support vector machine learning

    Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2012)
  • X. Miao et al.

    Distributed online one-class support vector machine for anomaly detection over networks.

    IEEE Trans. Cybern.

    (2019)
  • U. Sharif et al.

    Scene analysis and search using local features and support vector machine for effective content-based image retrieval

    Artif. Intell. Rev.

    (2019)
  • G. Taherzadeh et al.

    Sequence-based prediction of protein-peptide binding sites using support vector machine

    J. Comput. Chem.

    (2016)
  • Jayadeva et al.

    Twin support vector machines for pattern classification

    IEEE Trans. Pattern Anal. Mach Intell.

    (2007)
  • Y. Shao et al.

    Improvements on twin support vector machines

    IEEE Trans. Neural Netw.

    (2011)
  • Y.J. Tian et al.

    Nonparallel support vector machines for pattern classification

    IEEE Trans. Cybern.

    (2014)
  • G. Loosli et al.

    Learning SVM in Kreïn spaces

    IEEE Trans. Pattern Anal. Mach Intell.

    (2015)
  • A.J. Smola et al.

    Regularization with dot-product kernels

    Proceedings of Advances in Neural Information Processing Systems

    (2000)
  • Y.H. Chen et al.

    Learning kernels from indefinite similarities

    Proceedings of 26th International Conference On Machine Learning

    (2009)
  • E. Pekalska et al.

    A generalized kernel approach to dissimilarity-based classification

    J. Mach. Learn. Res.

    (2002)
  • Cited by (14)

    • Primal dual algorithm for solving the nonsmooth Twin SVM

      2024, Engineering Applications of Artificial Intelligence
    • Lightweight actor-critic generative adversarial networks for real-time smart generation control of microgrids

      2022, Applied Energy
      Citation Excerpt :

      Compared with the common lightweight methods, such as with group Lasso regularization [39], the nonconvex group MCP regularization has stronger sparsity and unbiasedness with better lightweight results [40]. Besides, the difference with convex (DC) decomposition [41] is applied to convert the nonconvex function into a convex function for optimizing the nonconvex group MCP regularization. After pre-training and online training, the LAC-GANs can obtain the optimal control strategy with less computing time and storage resources in a random environment.

    • State-of-the-art review on advancements of data mining in structural health monitoring

      2022, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      Based on reports [135,136], SVM is a new machine learning algorithm with suitable accuracy and beneficial generalization capacity. This technique aims to split the data into subsets into various areas, e.g. pattern recognition and data classification [137-139]. The theory of SVM is described as follows.

    View all citing articles on Scopus

    Yuexuan An, received her B.Sc in computer science and technology from Jiangsu Normal University in 2015 and M.Sc. degree in computer application technology in China University of Mining and Technology in 2019. She is currently pursuing the Ph.D. degree in the School of computer science and engineering, Southeast University. Her research interest includes machine learning, pattern recognition, SVM, kernel function and various applications.

    Hui Xue, received her B.Sc in mathematics from Nanjing Normal University in 2002, and M.Sc. in mathematics from Nanjing University of Aeronautics & Astronautics (NUAA) in 2005. In 2008, she received her Ph.D. degree in computer science from NUAA. She is a professor at the PALM Group, School of Computer Science and Engineering, Southeast University.

    View full text