One-class support vector classifiers: A survey

https://doi.org/10.1016/j.knosys.2020.105754Get rights and content

Abstract

Over the past two decades, one-class classification (OCC) becomes very popular due to its diversified applicability in data mining and pattern recognition problems. Concerning to OCC, one-class support vector classifiers (OCSVCs) have been extensively studied and improved for the technology-driven applications; still, there is no comprehensive literature available to guide researchers for future exploration. This survey paper presents an up to date, structured and well-organized review on one-class support vector classifiers. This survey comprises available algorithms, parameter estimation techniques, feature selection strategies, sample reduction methodologies, workability in distributed environment and application domains related to OCSVCs. In this way, this paper offers a detailed overview to researchers looking for the state-of-the-art in this area.

Introduction

The task of classification is an important and admired topic of research in the area of image processing and pattern recognition. Conventionally, a classifier (binary or multi-class) needs at least two well-defined classes to be distinguished and may misclassify testing samples when a dataset suffers from data irregularity problems like class distribution skew, class imbalance, absent features, small disjunct, etc. Specifically, when a class is ill-defined (under-sampled or absent), then the classification model does not work as anticipated. Initially, Minter [1] identified this issue and named it “single-class classification”. Later, Koch et al. [2] called this phenomenon “one-class classification” (OCC). Afterwards, many researchers termed this phenomenon differently based on the application domains to which one-class classification was performed like “novelty detection” by Bishop [3], “outlier detection” by Ritter and Gallegos [4] and “concept learning” by Japkowicz [5]. In OCC task, there are enough target class samples and very fewer outliers, i.e., either negative class samples (class of no interest) are partially available or absent. This property of the dataset makes decision boundary detection a complex and challenging task. It is also witnessed that like conventional classification problems such as measuring the estimation of classification error, the complexity of a solution, the generalization of classification methods and the curse of dimensionality also appear in one-class classification. For one-class classification, several machine learning models have been proposed like one-class nearest neighbour, one-class deep neural network and autoencoder, one-class random forest, one-class support vector classifiers, one-class support higher-order tensor machine, one-class ensemble model, etc. [6], [7], [8], [9], [10], [11].

Based on an extensive literature analysis, one-class support vector classifiers (OCSVCs) are found suitable for anomaly and novelty detection in numerous applications such as document classification [12], disease diagnosis [13], [14], fraud detection [15], [16], intrusion detection [17], [18] and novelty detection [19]. These varied applications of OCSVC make this classifier interesting and important in the field of data mining and pattern recognition. Though several research articles have been published concerning OCSVCs during last two decades, comprehensive literature is still not available that covers all important issues to help research community for further developments. This review paper summarizes and shelters all important issues concerning to OCSVCs. Feasible algorithms, feature selection, training sample reduction, parameter estimation, workability over distributed/streaming data and related application areas are identified as key issues. Fig. 1 represents the clustered representation of relevant research works, where the notations from ‘a’ to ‘f’ represent the following:

(a) OCSVC algorithms,

(b) Parameter estimation techniques,

(c) Feature selection methods,

(d) Training sample reduction methods,

(e) Distributed and online OCSVCs,

(f) Applications of OCSVCs.

Rest of the paper is organized as follows: Section 2 contains OCSVC algorithms and in Section 3 kernel parameter estimation techniques have been discussed. Section 4 describes a detailed review of feature selection methods, whereas Section 5 gives a review of sample reduction techniques. Distributed OCSVCs have been covered in Section 6, and Section 7 discusses the applications of OCSVCs. The last section contains concluding remarks and future scope.

Section snippets

OCSVC algorithms

The support vector machine (SVM) was introduced by Vapnik [20] and mainly used for tackling binary classification problems. Later, several extensions of SVM were proposed by researchers like least-squares SVM, linear programming SVM,sparse SVM, twin SVM, Universum SVM, twin spheres SVM etc. [21], [22], [23], [24], [25]. The one-class classification problem was solved by Tax et al. [26], [27], [28], [29], [30] via isolating the target class samples from outliers in sample space. In these

Kernel parameter estimation techniques for OCSVCs

Kernel techniques provide more flexibility to one-class support vector classifiers. Tax et al. [27] and Schlökopf et al. [33] showed that Gaussian kernel outperforms than any other kernel due to single tuning parameter, i.e., Gaussian width, which normalizes the number of support vectors of the separation boundary. In contrast, the increase in width parameter increases the volume of the enclosed region and decreases the number of support vectors. The denial rate of OCSVC is measured by the

Feature selection and reduction techniques for OCSVCs

Just like the conventional binary and multi-class classifiers, data preprocessing is very important for OCSVCs. More specifically, it includes data cleaning, class balancing [55], [63], [64] (not necessary for OCSVCs if the target class is well defined), feature selection/transformation (dimensionality reduction) and sample reduction (also known as data reduction). There is a deep impact of feature selection/reduction mechanism on the performance of a classifier. In a dataset, not all features

Sample reduction techniques for OCSVCs

The existing training sample reduction techniques concerning OCSVCs are discussed in this section. In the present era of big and stream data, enormous data is produced from geologically distributed data sources. In the presence of massive training samples, the classifiers may suffer from uneven computational cost and unusual resource consumption during the training phase. Training sample reduction is a solution to enhance the training efficiency and reduce computation cost. Many training sample

Distributed and online OCSVCs

Although, several OCSVC algorithms have been proposed for anomaly or novelty detection for batched data but not well explored for the distributed or online environment. Real-world applications like earth science, weather forecasting, satellite and aviation control, social networking, etc., continuously generate samples/features over the time-period. Information extraction from such complex streaming data is always a challenging task because of geographically distributed and heterogeneous data

Applications of OCSVCs

In this section, applications of OCSVCs are discussed. It is observed that the OCSVCs are most importantly used for anomaly and novelty detection in every application domain. An anomaly is referred to as an outlier in data mining and pattern recognition tasks. For one-class classification or anomaly detection, apart from OCSVCs other machine learning models [6], [7], [8], [9], [10], [11] also have been proposed like one-class nearest neighbour, one-class random forest, one-class deep neural

Concluding observation

From the different sections of the above literature, it has been observed that limited availability of outliers or ill-defined non-target class data is always challenging and interesting part of OCC problems. In this review paper, all important issues along with their solutions proposed by many researchers have been discussed. The review has been grouped into six important areas: OCSVC algorithms, kernel parameter estimation techniques, feature selection and reduction, training sample

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (147)

  • RichhariyaB. et al.

    A reduced universum twin support vector machine for class imbalance learning

    Pattern Recognit.

    (2020)
  • TaxD.M. et al.

    Support vector domain description

    Pattern Recognit. Lett.

    (1999)
  • ChoiY.-S.

    Least squares one-class support vector machine

    Pattern Recognit. Lett.

    (2009)
  • YangJ. et al.

    Brain activation detection by neighborhood one-class SVM

    Cogn. Syst. Res.

    (2010)
  • LiuY. et al.

    Robust solutions to fuzzy one-class support vector machine

    Pattern Recognit. Lett.

    (2016)
  • ZhuF. et al.

    A weighted one-class support vector machine

    Neurocomputing

    (2016)
  • WangK. et al.

    Ellipsoidal data description

    Neurocomputing

    (2017)
  • TianY. et al.

    Ramp loss one-class support vector machine; a robust and effective approach to anomaly detection problems

    Neurocomputing

    (2018)
  • XingH.-J. et al.

    Robust one-class support vector machine with rescaled hinge loss function

    Pattern Recognit.

    (2018)
  • XingH.-J. et al.

    Robust adaboost based ensemble of one-class support vector machines

    Inf. Fusion

    (2020)
  • XuG. et al.

    Robust support vector machines based on the rescaled hinge loss function

    Pattern Recognit.

    (2017)
  • SunJ. et al.

    Dynamic financial distress prediction with concept drift based on time weighting combined with adaboost support vector machine ensemble

    Knowl.-Based Syst.

    (2017)
  • SunJ. et al.

    Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting

    Inf. Fusion

    (2020)
  • XiaoY. et al.

    Two methods of selecting gaussian kernel parameters for one-class svm and their application to fault detection

    Knowl.-Based Syst.

    (2014)
  • SunJ. et al.

    Imbalanced enterprise credit evaluation with dte-sbd: decision tree ensemble based on smote and bagging with differentiated sampling rates

    Inform. Sci.

    (2018)
  • ZhangC. et al.

    Multi-imbalance: An open-source software for multi-class imbalance learning

    Knowl.-Based Syst.

    (2019)
  • LianH.

    On feature selection with principal component analysis for one-class svm

    Pattern Recognit. Lett.

    (2012)
  • FengK. et al.

    A recognition and novelty detection approach based on curvelet transform, nonlinear pca and svm with application to indicator diagram diagnosis

    Expert Syst. Appl.

    (2011)
  • LiuC. et al.

    An efficient instance selection algorithm to reconstruct training set for support vector machine

    Knowl.-Based Syst.

    (2017)
  • LiY.

    Selecting training points for one-class support vector machines

    Pattern Recognit. Lett.

    (2011)
  • ZhuF. et al.

    Boundary detection and sample reduction for one-class support vector machines

    Neurocomputing

    (2014)
  • XuJ. et al.

    New incremental learning algorithm with support vector machines

    IEEE Trans. Syst. Man Cybern.: Syst.

    (2018)
  • MinterT.

    Single-class classification

    (1975)
  • BishopC.M.

    Novelty detection and neural network validation

    IEE Proc.-Vis. Image Signal Process.

    (1994)
  • JapkowiczN.

    Concept-Learning in the Absence of Counter-Examples: an Autoassociation-Based Approach to Classification

    (1999)
  • MazhelisO.

    One-class classifiers: a review and analysis of suitability in the context of mobile-masquerader detection

    S. Afr. Comput. J.

    (2006)
  • KhanS.S. et al.

    One-class classification: taxonomy of study and review of techniques

    Knowl. Eng. Rev.

    (2014)
  • ChalapathyR. et al.

    Deep learning for anomaly detection: A survey

    (2019)
  • ChenY. et al.

    One-class support higher order tensor machine classifier

    Appl. Intell.

    (2017)
  • TajoddinA. et al.

    Ramd: registry-based anomaly malware detection using one-class ensemble classifiers

    Appl. Intell.

    (2019)
  • ManevitzL.M. et al.

    One-class SVMs for document classification

    J. Mach. Learn. Res.

    (2001)
  • UttreshwarG.S. et al.

    Hepatitis B Diagnosis Using Logical Inference and Self-Organizing Map 1

    (2008)
  • HejaziM. et al.

    One-class support vector machines approach to anomaly detection

    Appl. Artif. Intell.

    (2013)
  • VapnikV. et al.

    Statistical Learning Theory

    (1998)
  • AbeS.

    Support Vector Machines for Pattern Classification, Vol. 2

    (2005)
  • LuS. et al.

    All-in-one multicategory ramp loss maximum margin of twin spheres support vector machine

    Appl. Intell.

    (2019)
  • TaxD.M.J.

    One-Class Classification: Concept-Learning in the Absence of Counter-Examples

    (2001)
  • TaxD.M. et al.

    Data domain description using support vectors.

  • TaxD.M. et al.

    Combining one-class classifiers

  • TaxD.M. et al.

    Support vector data description

    Mach. Learn.

    (2004)
  • Cited by (79)

    View all citing articles on Scopus
    View full text