Full length articleAdaptive ensemble of classifiers with regularization for imbalanced data classification
Introduction
Imbalanced data classification refers to the classification of datasets with significantly different instance numbers across classes [1]. Specifically, for the binary imbalanced data classification problem, there is usually a dominating number of instances from one class (the majority class) and a few instances belonging to the other class (the minority class). The problem of binary imbalanced data classification is common in engineering and scientific practices [2], [3], [4]. The problem is non-trivial, because most of the general-purpose classification methods will overwhelmingly favor the majority class in the label-imbalanced scenario, leading to significant performance degradation. Consequently, the development of binary imbalanced classification algorithms has become an independent and active research area.
Among the popular algorithms for binary imbalanced classification, the dynamic ensemble of classifiers has attracted significant attention. It works by training multiple classifiers using different subsets of the data, and dynamically selecting from, or combining them, during the inference process. By picking the most competent classifier(s) for each specific test instance, this approach can mitigate the “majority favoritism” in imbalanced data classification [5], [6]. There are various advanced algorithms built on the strategy of the dynamic ensemble model, and the novelty of most of them lies in their use of certain new techniques to ‘ensemble’ models. For instance, [7] proposes a generalized mixture function to combine different classifiers, and [8] proposes an adaptive ensemble method based on the classification problem. We review a few similar methods in this paper, and more details are presented in Section 2.
Despite the success of the dynamic ensemble of classifiers with regard to various tasks, we are unaware of any existing model that addresses the overfitting exhibited by such classifiers. Overfitting is a common problem, wherein the behavior of the classifiers overly fit the training data. This adversely affects the performance on the test data, because not all the information in the training data is useful (e.g. noises). At first glance, it appears that the dynamic ensemble of classifiers can safely circumvent the curse of overfitting, because they utilize the test data during the selection of classifiers. However, because each classifier is usually trained using a small subset of data (which contains information from the local geometry only), dynamically picking the most competent of them can lead to the overfitting of the local geometry of these classifiers. Even if we interpolate the dynamic ensemble with a set of (fixed) trained weights for the classifiers, the overfitting problem will persist, as the weights are obtained purely from the training data. Hence, it seems there is no simple solution to the overfitting problem of the dynamic ensemble of classifiers.
We solve the aforementioned problems using the regularization effect arising from gaussian mixture model (GMM)-based resampling and the stochastic gradient descent (SGD) algorithm. The proposed method is called the adaptive ensemble of classifiers with regularization (AER), where the term “with regularization” refers to the two types of regularization schemes that are developed in this study. The AER method first performs data resampling based on the GMM [9], [10]. We will generate two types of subsets. The first type will have a broader inclusion of the points from the majority class, and the second will have an almost balanced number of instances from the two classes. The former type of subsets can force the classifiers to consider the global geometry; therefore, this is the regarded as the first regularization to alleviate the overfitting problem. The latter type of subsets provides information on the local geometries to ensure they fit sufficiently powerful classifiers. After completing the resampling process, one individual classifier is learned for each sampled subset, and we explicitly learn a set of fixed coefficients/weights by optimizing the cross-entropy loss of the combined model with the SGD. The adaptation of the SGD is the second regularization, and its effectiveness has been verified by numerous studies [11], [12], [13], [14]. During the inference procedure, the normalized coefficient of each individual classifier will be determined by a combination of the on-the-fly likelihood and the trained classifier coefficients.
We theoretically and empirically evaluate the performance of the proposed AER. From a theoretical perspective, we analyze the time and space complexity of the AER model, and prove that the seemingly complicated AER model actually requires less time and memory to train. From an empirical perspective, we test the performance of the AER model, using the XGBoost classifier [15] (we refer to the combined method as AER-XGBoost) based on seven imbalanced UCI machine learning datasets and a GMM-generated dataset with five variations. Based on multiple metrics, experimental results reveal that the AER-XGBoost model exhibits competitive performances, outperforming multiple standard methods, such as the SVM and decision tree, and state-of-the-art methods, such as the focal loss neural network [16], vanilla XGBoost [15], focal loss XGBoost [17], and the LightGBM model [18]. The Mcnemar’s and Wilcoxon signed-rank tests are performed to further validate the superior performance of the AER, and the results are mostly sufficient to reject the null hypothesis for performance difference. We note that the AER generally performs significantly better in severe label-imbalanced and complex decision boundary scenarios.
The rest of the paper is structured as follows: Section 2 reviews the related work and points out our idea. Section 3 introduces the algorithm in detail, with its properties. Section 4 analyzes the advantageous time and memory complexity of the proposed algorithm. Experimental Framework and results analysis are demonstrated in Section 5 and Section 6 respectively, and related discussions are presented in Section 7. Lastly, Section 8 provides a general conclusion of the paper.
Section snippets
Related work
Imbalanced data classification refers to the classification problem where the number of samples for each class label is not balanced, or, where the class distribution is biased or skewed [1]. Since most of the standard classifiers assume relatively balanced class distributions and equal mis-classification costs, the class-imbalance can be perceived as a form of data irregularity [19], and it could significantly deteriorate the performances of classifiers. Performing high-accuracy classification
Methods
In this section, we introduce the details of the proposed AER model. The structure is laid out as follows: Section 3.1 will introduce the GMM-fitting and generation of the two types of subsets; Section 3.2 will discuss the specific implementation with XGBoost, which is the individual ‘base’ classifiers used in the experiments; the SGD training for the ensemble of classifiers will be illustrated in Section 3.3; and finally, the weight interpolation/combination and probabilistic prediction will
Theoretical analysis of the AER
In this section, we will demonstrate that the proposed AER method has advantageous time and memory complexity. Specifically, we will show theoretically that, under certain assumptions and for any classifier implemented with the AER framework, the time complexity will be, asymptotically, at least as good as the original implementation, and the asymptotic memory complexity will always be better than the full-batch implementations.
To begin with, let us recap the notations used in the AER model.
Experimental analysis: the framework
In this section, we introduce the framework of our empirical analysis for the AER model. We introduce the datasets in Section 5.1 with their backgrounds and characteristics. The methods compared against the AER model are discussed in Section 5.2, and the metrics to evaluate the results are presented in Section 5.3. Finally, we discuss our approaches for statistical testing to validate the significance of the results in Section 5.4.
Experimental analysis: the results and discussions
In this section, we present and analyze the experimental results of the proposed AER method. As introduced in Sections 5.1 Datasets, 5.2 Compared methods, seven compared methods are implemented on twelve imbalanced datasets. Limited by space, UCI Bioassay and Abalone 19 datasets are selected for primary demonstration, including the performance evaluation of AER with respect to the change of related parameters, also a comprehensive table illustrating performance comparison between AER and other
Discussions
We dedicate this section to discussing the implications of the foregoing theoretical and empirical analysis and the missing details in the experiments. Specifically, we want to discuss the following aspects: 1. The effectiveness of the regularization; 2. The suitable problem for the AER and the choice between logarithm- and exponential-based AERs; 3. The practical training time and training dynamics of the AER; and 4. Natural improvements and extensions of the AER.
From the experiments in
Conclusion
In this paper, a novel method, the adaptive ensemble of classifiers with regularization (AER), has been proposed for binary imbalanced data classification. The details of the method, including implementations with the XGBoost, are provided, and related training formulas are derived. In addition to the regularization properties, we illustrate that the method has favorable time and memory complexity. The performance of the proposed algorithm is tested on multiple datasets, and empirical evidences
CRediT authorship contribution statement
Chen Wang: Worked out the technical details, Performed the experiments, Wrote and revised the manuscript, Discussed the results and contributed to the manuscript. Chengyuan Deng: Performed the experiments, Discussed the results and contributed to the manuscript. Zhoulu Yu: Performed the experiments, Wrote and revised the manuscript, Discussed the results and contributed to the manuscript. Dafeng Hui: Wrote and revised the manuscript, Discussed the results and contributed to the manuscript.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We appreciate the constructive suggestions of Qin Yu, Hang Zhang, Yanmei Yu, and Chao Sun for the paper. We also thank Michael Tan of University College London for his writing suggestions.
Funding statement
This work is supported by the Sichuan Science and Technology Program, China (2020YFG0051), and the University-Enterprise Cooperation Projects, China (17H1199, 19H0355, 19H1121).
References (68)
- et al.
Learning from class-imbalanced data: Review of methods and applications
Expert Syst. Appl.
(2017) - et al.
ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem
Knowl.-Based Syst.
(2015) - et al.
A study on combining dynamic selection and data preprocessing for imbalance learning
Neurocomputing
(2018) - et al.
Combining multiple algorithms in classifier ensembles using generalized mixture functions
Neurocomputing
(2018) - et al.
A framework for dynamic classifier selection oriented by the classification problem difficulty
Pattern Recognit.
(2018) Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique
Signal Process.
(1984)Deep learning in neural networks: An overview
Neural Netw.
(2015)- et al.
Handling data irregularities in classification: Foundations, trends, and future challenges
Pattern Recognit.
(2018) - et al.
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches
Knowl.-Based Syst.
(2013) - et al.
Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs
Neural Netw.
(2015)
Class-specific extreme learning machine for handling binary class imbalance problem
Neural Netw.
Online sequential class-specific extreme learning machine for binary imbalanced learning
Neural Netw.
An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes
Pattern Recognit.
A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM
Comput. Intell. Neurosci.
A survey of multiple classifier systems as hybrid systems
Inf. Fusion
From dynamic classifier selection to dynamic ensemble selection
Pattern Recognit.
LibD3C: ensemble classifiers with a clustering and dynamic selection strategy
Neurocomputing
META-DES: a dynamic ensemble selection framework using meta-learning
Pattern Recognit.
Dynamic classifier ensemble model for customer classification with imbalanced class distribution
Expert Syst. Appl.
Dynamic ensemble selection for multi-class classification with one-class classifiers
Pattern Recognit.
An overlap-sensitive margin classifier for imbalanced and overlapping data
Expert Syst. Appl.
Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
Appl. Soft Comput.
Elblocker: Predicting blocking bugs with ensemble imbalance learning
Inf. Softw. Technol.
Machine learning based mobile malware detection using highly imbalanced network traffic
Inform. Sci.
A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for brain tumor diagnosis
IEEE Access
Application of fuzzy support vector machine for determining the health index of the insulation system of in-service power transformers
IEEE Trans. Dielectr. Electr. Insul.
On dynamic ensemble selection and data preprocessing for multi-class imbalance learning
Int. J. Pattern Recognit. Artif. Intell.
Bayesian Reasoning and Machine Learning
Robust text-independent speaker identification using Gaussian mixture speaker models
IEEE Trans. Speech Audio Process.
Optimization methods for large-scale machine learning
SIAM Rev.
Focal loss for dense object detection
IEEE Trans. Pattern Anal. Mach. Intell.
Imbalance-XGBoost: Leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost
Pattern Recognit. Lett.
Cited by (18)
An improved generative adversarial network to oversample imbalanced datasets
2024, Engineering Applications of Artificial IntelligenceEnsemble learning with dynamic weighting for response modeling in direct marketing
2024, Electronic Commerce Research and ApplicationsRGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification
2023, Information Processing and ManagementPredicting personalized grouping and consumption: A collaborative evolution model
2021, Knowledge-Based SystemsCitation Excerpt :In fact, as the one-class nature of implicit feedback, the prediction task of our CEP model can be seen as a binary classification problem with imbalanced data [19]. In this scenario, the accuracy metric does not well reflect the model performance, thus precision and recall are commonly used to evaluate the performance on minority class [75]. Furthermore, as the harmonic mean of precision and recall, the F1-score also been widely used in classification task [76].
Intrusion Attack Detection Using Firefly Optimization Algorithm and Ensemble Classification Model
2023, Wireless Personal Communications