A multiclass classification using one-versus-all approach with the differential partition sampling ensemble

https://doi.org/10.1016/j.engappai.2020.104034Get rights and content

Abstract

The One-versus-all(OVA) approach is one of the mainstream decomposition methods by which multiple binary classifiers are used to solve multiclass classification tasks. However, it exists the problems of serious class imbalance. This paper proposes a differential partition sampling ensemble method(DPSE) in the OVA framework. The number of majority samples and that of the minority samples in each binary training dataset are used as the upper and lower limits of the sampling interval respectively. Within this range, the construction process of the arithmetic sequence is simulated to generate the set containing multiple different sampling numbers with equal intervals. All samples are divided into safe examples, borderline examples, rare examples, and outliers according to the neighborhood information, then Random undersampling for safe samples(s-Random undersampling) and SMOTE for borderline examples and rare examples (br-SMOTE) are proposed based on the distribution characteristics of the classes. In each iteration, according to the number of differential sampling, the two methods are used to undersample or oversample the majority and minority in each binary training dataset to balance the number of positive and negative samples, which preserves the characteristic of the class structure as much as possible. Balanced training sets are used to train the binary classification model with multiple sub classifiers. The thorough experiments performed on 27 KEEL public multiclass datasets show that DPSE outperforms the typical methods in the OVA scheme, the One-versus-One scheme or direct way in classification performance.

Introduction

Classification is one of the important problems in the field of data mining. It is widely used in various practical fields, such as sentiment classification (Catal and Nangir, 2017), fraud detection (Nami and Shajari, 2018, Triepels et al., 2018), and fault diagnosis (Islam and Kim, 2018). Usually, the binary classification problem involves two classes, and in the multiclass classification problem, the number of classes is greater than two. The multiclass classification problem is more complicated than the binary classification problem. The methods for dealing with multiclass classification problem are mainly divided into two groups. One is to expand the binary classifier into multiclass classifier through some strategies, and it includes some typical algorithms, such as support vector machine (de Lima et al., 2018), decision tree (Guan et al., 2017), Oblique decision tree ensemble (Zhang and Suganthan, 2014, Katuwal et al., 2020), XGBOOST (Chen and Guestrin, 2016) and deep neural network (Hosaka, 2019). The second is to divide the multiclass classification problem into multiple binary problems (binarization) (Zhou et al., 2017, Liu et al., 2017). The former directly uses one classifier to deal with multiclass classification tasks. It is easier to establish a classifier to distinguish two classes than to distinguish multiple classes, and the decision boundaries of two classes may be simpler than the multiclass decision boundaries (Krawczyk et al., 2018). In contrast, decomposing the original multiclass problem into several binary sub-problems is much easier, so the second method has attracted extensive attention in practical research (Zhang et al., 2016, Zhang et al., 2018, Li et al., 2020).

Two most popular approaches regarding binarization are one-versus-one (OVO) and one-versus-all (OVA) (Galar et al., 2011). The OVO approach divides a multiclass problem with m classes into m(m+1)2 binary sub-problems, and each classifier in the OVO discriminates between a pair of classes ci,cj. When a test pattern is classified by this scheme, the score matrix R will be obtained by all the binary classifiers. Since each classifier only has the ability to classify the two classes it contains, when the instance’ s true label does not belong to both classes, the classifier will give invalid discrimination (Galar et al., 2013, Galar et al., 2015, Zhou and Fujita, 2017). The OVA approach divides a multiclass problem with m classes into m binary sub-problems, and each classifier treats one of the classes as the positive class, and all the other classes as the negative class. Compared with the OVO scheme, when the dataset contains more classes, the OVA approach deploys fewer resources or uses fewer classifiers (Zhou and Fujita, 2017, Sen et al., 2016). As the number of classifiers decreases, the outputs of all binary classifiers are more simple to aggregate. Additionally, there is no invalid classifier phenomenon in the framework. Each classifier only considers a certain class and all the other classes, which simplifies the problem. However, even if there is no class-imbalance among the classes, as the number of all the other classes is larger than that of the target class, it will cause the imbalance of positive and negative sample numbers, which affects the classification effect of each binary classifier (Sen et al., 2016, Zhang et al., 2016, Li et al., 2020). How to solve the imbalance caused by OVA will be of great significance to improve the classification effect.

In recent years, binary class imbalance problems have received widespread attention (Lin et al., 2017, Douzas et al., 2018, Collell et al., 2018). In fact, in the OVA framework, each sub-problem can be regarded as a binary classification problem. Solving the imbalanced data classification problem in the OVA framework is actually solving the binary class imbalance problems. For example, Li et al. (2020) uses the OVO decomposition strategy to divide multiclass classification problems into multiple binary classification problems and then uses the oversampling with spectral clustering to balance the data of each binary class imbalance problem to effectively reduce the impact of imbalance. Zhang et al. (2016) explores the effectiveness of binary class imbalanced learning methods UnderBagging (Barandela et al., 2003), SMOTEBagging (Díez-Pastor et al., 2015), RUSBoost (Seiffert et al., 2010), SMOTEBoost (Chawla et al., 2003), SMOTE + AdaBoost (Liu et al., 2009), and EasyEnsemble (Liu et al., 2009) to solve multiclass imbalanced classification problems in the OVO framework. Experimental research proves that combining decomposition strategy and ensemble learning can improve mining imbalanced multi-class problems. Compared with the single sampling method, ensemble learning combined with data preprocessing can not only balance the number of samples but also improve diversity (García et al., 2018). Therefore, our research focuses on using ensemble learning methods to solve the class imbalance problem in the OVA framework to improve the classification performance of OVA.

In this paper, we propose a multi-classification method for multiclass imbalanced data classification by combining the OVA decomposition strategy with ensemble learning. This method proposes the differential partition sampling ensemble (DPSE) to construct a binary classification model for each binary classification problem, which can overcome the problem of data imbalance caused by the OVA decomposition strategy. DPSE combining ensemble learning and sampling methods establishes the differentiated training datasets to increase the diversity of each binary classification model. Before iteration, unlike DTE-SBD in Sun et al. (2018), DPSE calculates the differential set which includes multiple gradually increasing sampling numbers by simulating the construction process of the arithmetic sequence. In order to improve the performance of the single sampling method, all samples are divided into safe examples, borderline examples, rare examples, and outliers according to the neighborhood information before data processing. Then Random undersampling for safe samples(s-Random undersampling) and SMOTE for borderline examples and rare examples (br-SMOTE) methods that consider the distribution characteristics of the classes are proposed to balance the number of minority samples and majority samples in each iteration.

To verify the effectiveness of the proposed method, the average of each class accuracy (MAvA) (García et al., 2018) was used as the performance measure. 27 multiclass datasets from the KEEL dataset repository (Triguero et al., 2017) and three different base classifiers including CART (Gordon et al., 1984), Random forest (Liaw and Wiener, 2002), and SVM (Vapnic, 1998) were selected for thorough experimental research. In the OVA scheme, the differential partition sampling ensemble was compared with the typical imbalanced learning methods. Moreover, the proposed method was compared with three typical methods for solving multiclass imbalanced classification problems. The difference between the proposed method and other methods was also verified by the Friedman test and Holm–Bonferroni test (García et al., 2010).

The rest of this paper is organized as follows. We first describe the imbalanced learning problem and the solution in Section 2. Then, the approach proposed in this paper is introduced in Section 3. In Section 4, the experimental framework is presented in detail, as well as the results and discussion. Finally, the conclusions are given in Section 5.

Section snippets

Related work

In order to solve the data imbalance problem in the OVA scheme, we first introduce and analyze the existing imbalanced learning methods in Section 2.1. And then the OVA decomposition strategy is introduced in detail in Section 2.2.

The proposed method: the differential partition sampling ensemble in the OVA

The total number of original training samples corresponding to each classifier is the same in the OVA framework. Considering the effectiveness of the ensemble learning with preprocessing techniques in binary imbalance learning, the method is applied to each binary dataset to solve the data imbalance problem (Zhang et al., 2016). Undersampling and oversampling may have drawbacks when used in isolation, as a single approach is not suitable for all imbalanced datasets (Nanni et al., 2015). In

Experiment and evaluation

In this section, in order to verify the effectiveness of the proposed method, a comprehensive experimental analysis was performed on 27 KEEL public multiclass datasets using CART, Random forest, and SVM as the benchmark classifier respectively. The datasets chosen are described in Section 4.1, and the benchmark classifiers with their parameter settings are described in Section 4.2. The measure to evaluate the performance of the methods is presented in Section 4.3. In Section 4.4, the

Conclusion

Binarization is a common method to divide a multiclass classification problem into several binary problems. In this paper, the differential partition sampling ensemble in the OVA scheme is proposed to reduce the impact of data imbalance on overall classification performance. The differential set of the sampling number for each binary subset is determined in advance based on the idea of incremental arithmetic progression. Then multiple balanced training sets are generated by undersampling and

CRediT authorship contribution statement

Xin Gao: Conceptualization, Methodology, Writing - original draft, Supervision, Funding acquisition. Yang He: Conceptualization, Methodology, Software development, Validation, Writing - original draft. Mi Zhang: Methodology, Funding acquisition. Xinping Diao: Software development. Xiao Jing: Software development, Validation. Bing Ren: Experimental data preprocessing. Weijia Ji: Experimental data preprocessing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (55)

  • GarcíaS. et al.

    Dynamic ensemble selection for multi-class imbalanced datasets

    Inform. Sci.

    (2018)
  • GuanX. et al.

    A multi-view OVA model based on decision tree for multi-classification tasks

    Knowl.-Based Syst.

    (2017)
  • HaixiangG. et al.

    Learning from class-imbalanced data: Review of methods and applications

    Expert Syst. Appl.

    (2017)
  • HongJ.-H. et al.

    Fingerprint classification using one-vs-all support vector machines dynamically ordered with naive Bayes classifiers

    Pattern Recognit.

    (2008)
  • HosakaT.

    Bankruptcy prediction using imaged financial ratios and convolutional neural networks

    Expert Syst. Appl.

    (2019)
  • KatuwalR. et al.

    Heterogeneous oblique random forest

    Pattern Recognit.

    (2020)
  • KrawczykB. et al.

    Dynamic ensemble selection for multi-class classification with one-class classifiers

    Pattern Recognit.

    (2018)
  • KrawczykB. et al.

    Dynamic classifier selection for one-class classification

    Knowl.-Based Syst.

    (2016)
  • LiQ. et al.

    Multiclass imbalanced learning with one-versus-one decomposition and spectral clustering

    Expert Syst. Appl.

    (2020)
  • LinW.-C. et al.

    Clustering-based undersampling in class-imbalanced data

    Inform. Sci.

    (2017)
  • LiuY. et al.

    A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm

    Inform. Sci.

    (2017)
  • NamiS. et al.

    Cost-sensitive payment card fraud detection based on dynamic random forest and k -nearest neighbors

    Expert Syst. Appl.

    (2018)
  • NanniL. et al.

    Coupling different methods for overcoming the class imbalance problem

    Neurocomputing

    (2015)
  • O’BrienR. et al.

    A random forests quantile classifier for class imbalanced data

    Pattern Recognit.

    (2019)
  • RoyA. et al.

    A study on combining dynamic selection and data preprocessing for imbalance learning

    Neurocomputing

    (2018)
  • SáezJ.A. et al.

    Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

    Pattern Recognit.

    (2016)
  • SunJ. et al.

    Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates

    Inform. Sci.

    (2018)
  • Cited by (0)

    View full text