Ensemble Selection based on Classifier Prediction Confidence
Introduction
Ensemble learning has been studied extensively and is one of the most active research topics in the machine learning community. This kind of learning naturally emerges based on the fact that no learning algorithm can perform well on all datasets. In machine learning, each classifier uses its own approach to approximate the unknown relationship f between the feature vector and the class labels. As data collected from different sources can vary quite substantially, a learning algorithm may only provide good hypothesis on some datasets. By combining multiple classifiers in a single framework as in ensemble learning, we can diversify the learning and achieve better predictions than using a single classifier [1].
In ensemble methods, we aggregate the outputs of different classifiers to arrive at a collaborated decision. Classifiers can be generated in two different ways: training different algorithms on the same training set (heterogeneous ensemble method) or training a single algorithm on many different training sets (homogeneous ensemble method) [1], [2]. A combining algorithm is then used to combine the outputs of all classifiers to obtain the final decision.
Ensuring diversity in the outputs of the base classifiers is an important factor in a successful ensemble. Existing studies on diversity in an ensemble system mainly focus on its utilization to enhance the ensemble performance, for example in the combining algorithms [3], [4] and in the ensemble selection problem [5], [6], [7]. These methods, however, only capture the uncertainty generated by the agreements and disagreements between the different base classifiers. The exploitation of confidence in each base classifier's output to solve the ensemble selection (ES) problem, therefore, needs to be explored.
Our idea is based on the observation in real-life where a decision is sought from the committee of experts but different experts have different background and different level of expertise on a problem. When we know that an expert is very knowledgeable in a particular field, we will trust the recommendation of this expert even though he/she is not entirely confident about the current recommendation. On the other hand, if we know that an expert is less knowledgeable, we will only pay attention to his/her current recommendation if he/she is very sure of it. This idea is applied to select base classifiers for the final ensemble for a prediction. In this work, we encode the level of domain expertise of a base classifier by associating with each base classifier a threshold computed from the entire training set by minimizing the empirical 0–1 loss. Then, based on the soft classification output of a base classifier on a test sample, we quantify the confidence level of the current classification by computing the entropy of each base classifier's output. It is noted that high entropy in the prediction represents low confidence, therefore entropy can be used as a confidence measure. The entropy is then compared to the base classifier's threshold to determine whether the output of the base classifier should be included in the aggregation. A base classifier's output appears in the final set for subsequent ensemble combination if its confidence level is higher than its threshold.
The contributions of this paper are:
- (i)
We propose an approach to select a base classifier in an ensemble system based on its overall domain expertise and the confidence value it has on its current prediction. This allows us to integrate both the static and dynamic approach of ensemble selection.
- (ii)
We search for the individual threshold of each base classifier by minimizing the empirical loss on the training set. The optimal solution is obtained by using the artificial bee colony optimization.
- (iii)
Experiments on a number of datasets demonstrated the advantage of the proposed method compared to several well-known benchmark algorithms.
We organize the paper as follows. In Section 2, we briefly discuss some related work on ensemble methods and ensemble selection. In Section 3, we describe our approach to measure and select the expert's answer based on its confidence in relation to its domain expertise. In Section 4, we present our experimental studies in which we compare the performance of the proposed method and the benchmark algorithms on some popular datasets. In Section 5, we draw some conclusions.
Section snippets
Ensemble methods
Research on ensemble methods focuses mainly either on the design of new ensemble systems, improving the ensemble performance, or the study of ensemble properties. There are two approaches to design a new ensemble system: training data generation and combining algorithm formulation. In [8], Younsi and Bagnall introduced two ensemble systems using random sphere cover classifiers (RSC). The first ensemble system is based on the resampling/reweighting mechanism in which the RSCs are generated
Problem formulation
Assume that we have a committee of K experts { Kk } each of whom gives an answer to a problem. Classically, the answers from all experts are received and combined to obtain the final decision. However, some of the answers do not have high enough confidence and should be excluded from the final committee decision. Here we assume that each answer has its own confidence and that we prefer highly confident answers to those with low confidence before making the final decision. Moreover, we also
Experimental datasets
We compared the proposed method and benchmark algorithms by conducting experiments on a number of datasets from some data sources such as UCI (http://archive.ics.uci.edu/ml/datasets.html), OpenML (https://www.openml.org), and MOA library (https://moa.cms.waikato.ac.nz) as shown in Table 1. These datasets are popular in experiments with various classification systems. Here the datasets were selected in a diverse way to ensure objectivity in the comparison. The number of observations is from
Different entropy formulations
In this study, three different entropy measures (4)–(6) were used to quantify the information content in the output of the base classifiers. Here we aimed to assess the influence of entropy measure on the performance of the proposed method. The experimental results are shown in Fig. 2, with the detailed results provided in Table S1 the Supplementary Material. Clearly, the entropy measure only affects slightly the classification error rates on the experimental datasets. The most significant
Conclusions
In this study, we proposed a novel ES method by selecting the base classifiers with high confidence in their prediction, taking into account the level of expertise of each base classifier. We quantified the classifier's level of expertise for a problem by computing its credibility threshold based on minimizing the 0–1 loss function on all the labeled observations in a training set. This constitutes the static aspect of our ensemble selection approach. The dynamic aspect of our ensemble
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Tien Thanh Nguyen received his Ph.D. degree in computer science from the School of Information & Communication Technology, Griffith University, Australia in 2017. He is currently a Research Fellow at the School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, Scotland, UK. His research interest is in the field of machine learning, pattern recognition, and evolutionary computation. He is a member of the IEEE since 2014.
References (40)
- et al.
Combining heterogeneous classifiers via granular prototypes
Appl. Soft Comput.
(2018) - et al.
Heterogeneous classifier ensemble with fuzzy rule-based meta-learner
Inf. Sci.
(2018) - et al.
Decision templates for multiple classifier fusion: an experimental comparison
Pattern Recognit.
(2001) - et al.
Dynamic selection of classifiers-a comprehensive review
Pattern Recognit.
(2014) - et al.
Margin & diversity based ordering ensemble pruning
Neurocomputing
(2018) - et al.
Ensembles of random sphere cover classifiers
Pattern Recognit.
(2016) - et al.
A novel ensemble method for k-nearest neighbor
Pattern Recognit.
(2019) - et al.
A parameter randomization approach for constructing classifier ensembles
Pattern Recognit.
(2017) - et al.
Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data
Knowl. Based Syst.
(2016) - et al.
Progressive subspace ensemble learning
Pattern Recognit.
(2016)
A weighted multiple classifier framework based on random projection
Inf. Sci.
New diversity measure for data stream classification ensembles
Eng. Appl. Artif. Intell.
Exploration of classification confidence in ensemble learning
Pattern Recognit.
Dynamic classifier selection: recent advances and perspectives
Inf. Fusion
Applying Ant Colony Optimization to configuring stacking ensembles for data mining
Expert Syst. Appl.
META-DES: a dynamic ensemble selection framework using meta-learning
Pattern Recognit.
A measure of competence based on random classification for dynamic ensemble selection
Inf. Fusion
A probabilistic model of classifier competence for dynamic ensemble selection
Pattern Recognit.
A comparative study of artificial bee colony algorithm
Appl. Math. Comput.
RotBoost: a technique for combining rotation forest and adaboost
Pattern Recognit. Lett.
Cited by (55)
An analysis of ensemble pruning methods under the explanation of Random Forest
2024, Information SystemsBreaking the structure of MaMaDroid
2023, Expert Systems with ApplicationsDEFEG: Deep Ensemble with Weighted Feature Generation
2023, Knowledge-Based SystemsA data-driven method for online transient stability monitoring with vision-transformer networks
2023, International Journal of Electrical Power and Energy SystemsRe-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization
2023, Pattern Recognition Letters
Tien Thanh Nguyen received his Ph.D. degree in computer science from the School of Information & Communication Technology, Griffith University, Australia in 2017. He is currently a Research Fellow at the School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, Scotland, UK. His research interest is in the field of machine learning, pattern recognition, and evolutionary computation. He is a member of the IEEE since 2014.
Anh Vu Luong is currently a Ph.D. student at the School of Information & Communication Technology, Griffith University, Australia. His research interest is in the field of machine learning and pattern recognition.
Manh Truong Dang is currently a Research Assistant at the School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, Scotland, UK. His research interest is in the field of machine learning and pattern recognition.
Alan Wee-Chung Liew is currently an Associate Professor at the School of Information & Communication Technology, Griffith University, Australia. His research interest is in the field of machine learning, pattern recognition, computer vision, medical imaging, and bioinformatics. He has served on the technical program committee of many international conferences and is on the editorial board of several journals, including the IEEE Transactions on Fuzzy Systems. He is a senior member of the IEEE since 2005.
John McCall is currently a Professor at the School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, Scotland, UK. His research interest is in the area of naturally-inspired computing and their application to real-world problems arising in complex engineering and medical/biological systems.