One-class support vector classifiers: A survey
Introduction
The task of classification is an important and admired topic of research in the area of image processing and pattern recognition. Conventionally, a classifier (binary or multi-class) needs at least two well-defined classes to be distinguished and may misclassify testing samples when a dataset suffers from data irregularity problems like class distribution skew, class imbalance, absent features, small disjunct, etc. Specifically, when a class is ill-defined (under-sampled or absent), then the classification model does not work as anticipated. Initially, Minter [1] identified this issue and named it “single-class classification”. Later, Koch et al. [2] called this phenomenon “one-class classification” (OCC). Afterwards, many researchers termed this phenomenon differently based on the application domains to which one-class classification was performed like “novelty detection” by Bishop [3], “outlier detection” by Ritter and Gallegos [4] and “concept learning” by Japkowicz [5]. In OCC task, there are enough target class samples and very fewer outliers, i.e., either negative class samples (class of no interest) are partially available or absent. This property of the dataset makes decision boundary detection a complex and challenging task. It is also witnessed that like conventional classification problems such as measuring the estimation of classification error, the complexity of a solution, the generalization of classification methods and the curse of dimensionality also appear in one-class classification. For one-class classification, several machine learning models have been proposed like one-class nearest neighbour, one-class deep neural network and autoencoder, one-class random forest, one-class support vector classifiers, one-class support higher-order tensor machine, one-class ensemble model, etc. [6], [7], [8], [9], [10], [11].
Based on an extensive literature analysis, one-class support vector classifiers (OCSVCs) are found suitable for anomaly and novelty detection in numerous applications such as document classification [12], disease diagnosis [13], [14], fraud detection [15], [16], intrusion detection [17], [18] and novelty detection [19]. These varied applications of OCSVC make this classifier interesting and important in the field of data mining and pattern recognition. Though several research articles have been published concerning OCSVCs during last two decades, comprehensive literature is still not available that covers all important issues to help research community for further developments. This review paper summarizes and shelters all important issues concerning to OCSVCs. Feasible algorithms, feature selection, training sample reduction, parameter estimation, workability over distributed/streaming data and related application areas are identified as key issues. Fig. 1 represents the clustered representation of relevant research works, where the notations from ‘a’ to ‘f’ represent the following:
(a) OCSVC algorithms,
(b) Parameter estimation techniques,
(c) Feature selection methods,
(d) Training sample reduction methods,
(e) Distributed and online OCSVCs,
(f) Applications of OCSVCs.
Rest of the paper is organized as follows: Section 2 contains OCSVC algorithms and in Section 3 kernel parameter estimation techniques have been discussed. Section 4 describes a detailed review of feature selection methods, whereas Section 5 gives a review of sample reduction techniques. Distributed OCSVCs have been covered in Section 6, and Section 7 discusses the applications of OCSVCs. The last section contains concluding remarks and future scope.
Section snippets
OCSVC algorithms
The support vector machine (SVM) was introduced by Vapnik [20] and mainly used for tackling binary classification problems. Later, several extensions of SVM were proposed by researchers like least-squares SVM, linear programming SVM,sparse SVM, twin SVM, Universum SVM, twin spheres SVM etc. [21], [22], [23], [24], [25]. The one-class classification problem was solved by Tax et al. [26], [27], [28], [29], [30] via isolating the target class samples from outliers in sample space. In these
Kernel parameter estimation techniques for OCSVCs
Kernel techniques provide more flexibility to one-class support vector classifiers. Tax et al. [27] and Schlökopf et al. [33] showed that Gaussian kernel outperforms than any other kernel due to single tuning parameter, i.e., Gaussian width, which normalizes the number of support vectors of the separation boundary. In contrast, the increase in width parameter increases the volume of the enclosed region and decreases the number of support vectors. The denial rate of OCSVC is measured by the
Feature selection and reduction techniques for OCSVCs
Just like the conventional binary and multi-class classifiers, data preprocessing is very important for OCSVCs. More specifically, it includes data cleaning, class balancing [55], [63], [64] (not necessary for OCSVCs if the target class is well defined), feature selection/transformation (dimensionality reduction) and sample reduction (also known as data reduction). There is a deep impact of feature selection/reduction mechanism on the performance of a classifier. In a dataset, not all features
Sample reduction techniques for OCSVCs
The existing training sample reduction techniques concerning OCSVCs are discussed in this section. In the present era of big and stream data, enormous data is produced from geologically distributed data sources. In the presence of massive training samples, the classifiers may suffer from uneven computational cost and unusual resource consumption during the training phase. Training sample reduction is a solution to enhance the training efficiency and reduce computation cost. Many training sample
Distributed and online OCSVCs
Although, several OCSVC algorithms have been proposed for anomaly or novelty detection for batched data but not well explored for the distributed or online environment. Real-world applications like earth science, weather forecasting, satellite and aviation control, social networking, etc., continuously generate samples/features over the time-period. Information extraction from such complex streaming data is always a challenging task because of geographically distributed and heterogeneous data
Applications of OCSVCs
In this section, applications of OCSVCs are discussed. It is observed that the OCSVCs are most importantly used for anomaly and novelty detection in every application domain. An anomaly is referred to as an outlier in data mining and pattern recognition tasks. For one-class classification or anomaly detection, apart from OCSVCs other machine learning models [6], [7], [8], [9], [10], [11] also have been proposed like one-class nearest neighbour, one-class random forest, one-class deep neural
Concluding observation
From the different sections of the above literature, it has been observed that limited availability of outliers or ill-defined non-target class data is always challenging and interesting part of OCC problems. In this review paper, all important issues along with their solutions proposed by many researchers have been discussed. The review has been grouped into six important areas: OCSVC algorithms, kernel parameter estimation techniques, feature selection and reduction, training sample
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (147)
- et al.
Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition
Neural Netw.
(1995) - et al.
Outliers in statistical pattern recognition and an application to automatic chromosome classification
Pattern Recognit. Lett.
(1997) - et al.
A review of novelty detection
Signal Process.
(2014) - et al.
Novelty detection using one-class parzen density estimator. an application to surveillance of nosocomial infections.
Stud. Health Technol. Inf.
(2008) - et al.
A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples
Pattern Recognit.
(2017) - et al.
An anomaly detection system based on variable n-gram features and one-class svm
Inf. Softw. Technol.
(2017) - et al.
Combining ensemble methods and social network metrics for improving accuracy of ocsvm on intrusion detection in scada systems
J. Inf. Secur. Appl.
(2016) - et al.
Active learning based support vector data description method for robust novelty detection
Knowl.-Based Syst.
(2018) - et al.
Twin support vector machine: a review from 2007 to 2014
Egypt. Inf. J.
(2015) - et al.
An improved non-parallel universum support vector machine and its safe sample screening rule
Knowl.-Based Syst.
(2019)
A reduced universum twin support vector machine for class imbalance learning
Pattern Recognit.
Support vector domain description
Pattern Recognit. Lett.
Least squares one-class support vector machine
Pattern Recognit. Lett.
Brain activation detection by neighborhood one-class SVM
Cogn. Syst. Res.
Robust solutions to fuzzy one-class support vector machine
Pattern Recognit. Lett.
A weighted one-class support vector machine
Neurocomputing
Ellipsoidal data description
Neurocomputing
Ramp loss one-class support vector machine; a robust and effective approach to anomaly detection problems
Neurocomputing
Robust one-class support vector machine with rescaled hinge loss function
Pattern Recognit.
Robust adaboost based ensemble of one-class support vector machines
Inf. Fusion
Robust support vector machines based on the rescaled hinge loss function
Pattern Recognit.
Dynamic financial distress prediction with concept drift based on time weighting combined with adaboost support vector machine ensemble
Knowl.-Based Syst.
Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting
Inf. Fusion
Two methods of selecting gaussian kernel parameters for one-class svm and their application to fault detection
Knowl.-Based Syst.
Imbalanced enterprise credit evaluation with dte-sbd: decision tree ensemble based on smote and bagging with differentiated sampling rates
Inform. Sci.
Multi-imbalance: An open-source software for multi-class imbalance learning
Knowl.-Based Syst.
On feature selection with principal component analysis for one-class svm
Pattern Recognit. Lett.
A recognition and novelty detection approach based on curvelet transform, nonlinear pca and svm with application to indicator diagram diagnosis
Expert Syst. Appl.
An efficient instance selection algorithm to reconstruct training set for support vector machine
Knowl.-Based Syst.
Selecting training points for one-class support vector machines
Pattern Recognit. Lett.
Boundary detection and sample reduction for one-class support vector machines
Neurocomputing
New incremental learning algorithm with support vector machines
IEEE Trans. Syst. Man Cybern.: Syst.
Single-class classification
Novelty detection and neural network validation
IEE Proc.-Vis. Image Signal Process.
Concept-Learning in the Absence of Counter-Examples: an Autoassociation-Based Approach to Classification
One-class classifiers: a review and analysis of suitability in the context of mobile-masquerader detection
S. Afr. Comput. J.
One-class classification: taxonomy of study and review of techniques
Knowl. Eng. Rev.
Deep learning for anomaly detection: A survey
One-class support higher order tensor machine classifier
Appl. Intell.
Ramd: registry-based anomaly malware detection using one-class ensemble classifiers
Appl. Intell.
One-class SVMs for document classification
J. Mach. Learn. Res.
Hepatitis B Diagnosis Using Logical Inference and Self-Organizing Map 1
One-class support vector machines approach to anomaly detection
Appl. Artif. Intell.
Statistical Learning Theory
Support Vector Machines for Pattern Classification, Vol. 2
All-in-one multicategory ramp loss maximum margin of twin spheres support vector machine
Appl. Intell.
One-Class Classification: Concept-Learning in the Absence of Counter-Examples
Data domain description using support vectors.
Combining one-class classifiers
Support vector data description
Mach. Learn.
Cited by (79)
Sub-Visible Particle Classification and Label Consistency Analysis for Flow-Imaging Microscopy Via Machine Learning Methods
2024, Journal of Pharmaceutical SciencesUnsupervised hypersphere description approach for detecting and localizing anomalies in drivetrain with normal data
2024, Measurement: Journal of the International Measurement ConfederationContrastive deep support vector data description
2023, Pattern RecognitionMiniaturized multisensor system with a thermal gradient: Performance beyond the calibration range
2023, Journal of Science: Advanced Materials and Devices