Ensemble Selection based on Classifier Prediction Confidence

doi:10.1016/j.patcog.2019.107104

Pattern Recognition

Volume 100, April 2020, 107104

https://doi.org/10.1016/j.patcog.2019.107104 Get rights and content

Highlights

•
An ensemble selection method that takes into account each base classifier's confidence during classification and its overall credibility on the task is proposed.
•
The overall credibility of a base classifier is obtained by minimizing the empirical 0–1 loss on the entire training set.
•
The classifier's confidence in prediction for a test sample is measured by the entropy of its soft classification outputs for that sample.
•
Extensive comparative experiments with the state-of-the-art algorithms on ensemble selection validated the superior performance of our algorithm.

Abstract

Ensemble selection is one of the most studied topics in ensemble learning because a selected subset of base classifiers may perform better than the whole ensemble system. In recent years, a great many ensemble selection methods have been introduced. However, many of these lack flexibility: either a fixed subset of classifiers is pre-selected for all test samples (static approach), or the selection of classifiers depends upon the performance of techniques that define the region of competence (dynamic approach). In this paper, we propose an ensemble selection method that takes into account each base classifier's confidence during classification and the overall credibility of the base classifier in the ensemble. In other words, a base classifier is selected to predict for a test sample if the confidence in its prediction is higher than its credibility threshold. The credibility thresholds of the base classifiers are found by minimizing the empirical 0–1 loss on the entire training observations. In this way, our approach integrates both the static and dynamic aspects of ensemble selection. Experiments on 62 datasets demonstrate that the proposed method achieves much better performance in comparison to some ensemble methods.

Introduction

Ensemble learning has been studied extensively and is one of the most active research topics in the machine learning community. This kind of learning naturally emerges based on the fact that no learning algorithm can perform well on all datasets. In machine learning, each classifier uses its own approach to approximate the unknown relationship f between the feature vector and the class labels. As data collected from different sources can vary quite substantially, a learning algorithm may only provide good hypothesis on some datasets. By combining multiple classifiers in a single framework as in ensemble learning, we can diversify the learning and achieve better predictions than using a single classifier [1].

In ensemble methods, we aggregate the outputs of different classifiers to arrive at a collaborated decision. Classifiers can be generated in two different ways: training different algorithms on the same training set (heterogeneous ensemble method) or training a single algorithm on many different training sets (homogeneous ensemble method) [1], [2]. A combining algorithm is then used to combine the outputs of all classifiers to obtain the final decision.

Ensuring diversity in the outputs of the base classifiers is an important factor in a successful ensemble. Existing studies on diversity in an ensemble system mainly focus on its utilization to enhance the ensemble performance, for example in the combining algorithms [3], [4] and in the ensemble selection problem [5], [6], [7]. These methods, however, only capture the uncertainty generated by the agreements and disagreements between the different base classifiers. The exploitation of confidence in each base classifier's output to solve the ensemble selection (ES) problem, therefore, needs to be explored.

Our idea is based on the observation in real-life where a decision is sought from the committee of experts but different experts have different background and different level of expertise on a problem. When we know that an expert is very knowledgeable in a particular field, we will trust the recommendation of this expert even though he/she is not entirely confident about the current recommendation. On the other hand, if we know that an expert is less knowledgeable, we will only pay attention to his/her current recommendation if he/she is very sure of it. This idea is applied to select base classifiers for the final ensemble for a prediction. In this work, we encode the level of domain expertise of a base classifier by associating with each base classifier a threshold computed from the entire training set by minimizing the empirical 0–1 loss. Then, based on the soft classification output of a base classifier on a test sample, we quantify the confidence level of the current classification by computing the entropy of each base classifier's output. It is noted that high entropy in the prediction represents low confidence, therefore entropy can be used as a confidence measure. The entropy is then compared to the base classifier's threshold to determine whether the output of the base classifier should be included in the aggregation. A base classifier's output appears in the final set for subsequent ensemble combination if its confidence level is higher than its threshold.

The contributions of this paper are:

(i)
We propose an approach to select a base classifier in an ensemble system based on its overall domain expertise and the confidence value it has on its current prediction. This allows us to integrate both the static and dynamic approach of ensemble selection.
(ii)
We search for the individual threshold of each base classifier by minimizing the $0 - 1$ empirical loss on the training set. The optimal solution is obtained by using the artificial bee colony optimization.
(iii)
Experiments on a number of datasets demonstrated the advantage of the proposed method compared to several well-known benchmark algorithms.

We organize the paper as follows. In Section 2, we briefly discuss some related work on ensemble methods and ensemble selection. In Section 3, we describe our approach to measure and select the expert's answer based on its confidence in relation to its domain expertise. In Section 4, we present our experimental studies in which we compare the performance of the proposed method and the benchmark algorithms on some popular datasets. In Section 5, we draw some conclusions.

Section snippets

Ensemble methods

Research on ensemble methods focuses mainly either on the design of new ensemble systems, improving the ensemble performance, or the study of ensemble properties. There are two approaches to design a new ensemble system: training data generation and combining algorithm formulation. In [8], Younsi and Bagnall introduced two ensemble systems using random sphere cover classifiers (RSC). The first ensemble system is based on the resampling/reweighting mechanism in which the RSCs are generated

Problem formulation

Assume that we have a committee of K experts { K_k } each of whom gives an answer to a problem. Classically, the answers from all experts are received and combined to obtain the final decision. However, some of the answers do not have high enough confidence and should be excluded from the final committee decision. Here we assume that each answer has its own confidence and that we prefer highly confident answers to those with low confidence before making the final decision. Moreover, we also

Experimental datasets

We compared the proposed method and benchmark algorithms by conducting experiments on a number of datasets from some data sources such as UCI (http://archive.ics.uci.edu/ml/datasets.html), OpenML (https://www.openml.org), and MOA library (https://moa.cms.waikato.ac.nz) as shown in Table 1. These datasets are popular in experiments with various classification systems. Here the datasets were selected in a diverse way to ensure objectivity in the comparison. The number of observations is from

Different entropy formulations

In this study, three different entropy measures (4)–(6) were used to quantify the information content in the output of the base classifiers. Here we aimed to assess the influence of entropy measure on the performance of the proposed method. The experimental results are shown in Fig. 2, with the detailed results provided in Table S1 the Supplementary Material. Clearly, the entropy measure only affects slightly the classification error rates on the experimental datasets. The most significant

Conclusions

In this study, we proposed a novel ES method by selecting the base classifiers with high confidence in their prediction, taking into account the level of expertise of each base classifier. We quantified the classifier's level of expertise for a problem by computing its credibility threshold based on minimizing the 0–1 loss function on all the labeled observations in a training set. This constitutes the static aspect of our ensemble selection approach. The dynamic aspect of our ensemble

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Tien Thanh Nguyen received his Ph.D. degree in computer science from the School of Information & Communication Technology, Griffith University, Australia in 2017. He is currently a Research Fellow at the School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, Scotland, UK. His research interest is in the field of machine learning, pattern recognition, and evolutionary computation. He is a member of the IEEE since 2014.

References (40)

T.T. Nguyen et al.
Combining heterogeneous classifiers via granular prototypes
Appl. Soft Comput.
(2018)
T.T. Nguyen et al.
Heterogeneous classifier ensemble with fuzzy rule-based meta-learner
Inf. Sci.
(2018)
L.I. Kuncheva et al.
Decision templates for multiple classifier fusion: an experimental comparison
Pattern Recognit.
(2001)
A.S. Britto et al.
Dynamic selection of classifiers-a comprehensive review
Pattern Recognit.
(2014)
H. Guo et al.
Margin & diversity based ordering ensemble pruning
Neurocomputing
(2018)
R. Younsi et al.
Ensembles of random sphere cover classifiers
Pattern Recognit.
(2016)
Y. Zhang et al.
A novel ensemble method for k-nearest neighbor
Pattern Recognit.
(2019)
E. Santucci et al.
A parameter randomization approach for constructing classifier ensembles
Pattern Recognit.
(2017)
L. Yijing et al.
Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data
Knowl. Based Syst.
(2016)
Z. Yu et al.
Progressive subspace ensemble learning
Pattern Recognit.
(2016)

T.T. Nguyen et al.

A weighted multiple classifier framework based on random projection

Inf. Sci.

(2019)

K. Jackowski

New diversity measure for data stream classification ensembles

Eng. Appl. Artif. Intell.

(2018)

L. Li et al.

Exploration of classification confidence in ensemble learning

Pattern Recognit.

(2014)

R.M.O. Cruz et al.

Dynamic classifier selection: recent advances and perspectives

Inf. Fusion

(2018)

Y. Chen et al.

Applying Ant Colony Optimization to configuring stacking ensembles for data mining

Expert Syst. Appl.

(2014)

R.M.O. Cruz et al.

META-DES: a dynamic ensemble selection framework using meta-learning

Pattern Recognit.

(2015)

T. Woloszynski et al.

A measure of competence based on random classification for dynamic ensemble selection

Inf. Fusion

(2012)

T. Woloszynski et al.

A probabilistic model of classifier competence for dynamic ensemble selection

Pattern Recognit.

(2011)

D. Karaboga et al.

A comparative study of artificial bee colony algorithm

Appl. Math. Comput.

(2009)

C.-X. Zhang et al.

RotBoost: a technique for combining rotation forest and adaboost

Pattern Recognit. Lett.

(2008)

Cited by (55)

An analysis of ensemble pruning methods under the explanation of Random Forest
2024, Information Systems
“Black box” models created by modern machine learning techniques are typically hard to interpret. Thus, the necessity of explainable artificial intelligence (XAI) has grown for understanding the rationale behind those models and converting them into white boxes. Random Forest is a black box model essential in various domains due to its flexibility, ease of use, and remarkable predictive performance. One method for explaining a Random Forest is transforming it into a self-explainable Decision Tree using Forest-Based Tree (FBT) algorithm. It basically consists of three main phases, pruning, conjunction set generation, and Decision Tree construction. In this paper, we examine six state-of-the-art pruning approaches and analyze their effect on FBT performance through pruned FBT (PFBT) in order to minimize its computational complexity. This would make it appropriate for forests and datasets of any size. They are assessed on 30 datasets, and the results show that UMEP and Hybrid pruning methods can be effectively used in the pruning stage of the PFBT algorithm in terms of pruning time and predictive performance. However, the AUC-Greedy method achieves good performance with small-size datasets.
Breaking the structure of MaMaDroid
2023, Expert Systems with Applications
Android malware is a continuously expanding threat to billions of mobile users around the globe. Detection systems are updated constantly to address these threats. However, a backlash takes the form of evasion attacks, in which an adversary changes malicious samples in the wild such that they will be misclassified as benign. This paper comprehensively inspects a well-known Android malware detection system, MaMaDroid, which analyzes the control flow graph of the application. Changes in the portion of benign samples in the training set are considered to reveal their effect on the resulting classifier. These changes in the ratio between benign and malicious samples have a clear effect on each of the models, resulting in a decrease of more than 40% in their detection rate, model confidence, and reliability. Moreover, adopted Machine Learning models were implemented as well, including 5-NN, Decision Tree, and Adaboost. Exploration of the six models showed a typical behavior in different cases, of tree-based models and distance-based models. Moreover, three novel attacks that manipulate the Control Flow Graph (CFG) are described for each of the targeted models. The attacks decrease the detection rate of most models to less than 10%, with regards to different ratios of benign to malicious apps. As a result, a new version of MaMaDroid is engineered, which fuses the CFG of the app and static analysis of features of the app. This improved model is proven to be robust against evasion attacks targeting CFG-based models and static analysis models, achieving a detection rate of $\sim 80 %$ .
Determining the trustworthiness of DNNs in classification tasks using generalized feature-based confidence metric
2023, Pattern Recognition
Determining the confidence of Deep Neural Networks in predictions is crucial for building reliable and robust systems. However, it has received minor attention among other areas related to Deep Learning. The confidence of DNNs in predictions is highly correlated with their ability in feature extraction. Consequently, a more robust feature extractor in DNNs leads to a more confident and trustworthy model. In this study, a method is designed in order to determine the trustworthiness of DNNs based on the quality of their feature extraction components. The concept of feature quality is defined based on the models’ confidence in predictions. In a situation where two DNNs have approximately the same accuracy, the superior model has more confidence in its predictions. Hence, it is less influenced by overfitting, making it more robust and reliable in unseen and noisy environments. Determining such a model is not always possible with the well-known accuracy metric. Accordingly, a novel metric named Generalized Feature-Based Confidence Metric is proposed, which is capable of profoundly evaluating the models’ confidence in predictions. It analyzes layer-by-layer feature vectors generated by DNNs and evaluates their quality. Altogether, these utilities boost assessing and comparing different models with varying widths and depths, improving them, and picking the best one. The practicality of the proposed method and metric is investigated through four significantly diverse case studies and empirically proved. Three of them are reputable benchmarking datasets, namely, CIFAR-10, CIFAR-100, and Fashion-MNIST. Moreover, a new high-quality dataset for the Hand Rubbing problem (made by the authors) is used to analyze the proposed method’s performance in a real-world application. Overall, the proposed metric is able to distinguish between different models from about 1% to 8% in terms of confidence in predictions where the models possess almost the same accuracy (0.5% difference or lower).
DEFEG: Deep Ensemble with Weighted Feature Generation
2023, Knowledge-Based Systems
With the significant breakthrough of Deep Neural Networks in recent years, multi-layer architecture has influenced other sub-fields of machine learning including ensemble learning. In 2017, Zhou and Feng introduced a deep random forest called gcForest that involves several layers of Random Forest-based classifiers. Although gcForest has outperformed several benchmark algorithms on specific datasets in terms of classification accuracy and model complexity, its input features do not ensure better performance when going deeply through layer-by-layer architecture. We address this limitation by introducing a deep ensemble model with a novel feature generation module. Unlike gcForest where the original features are concatenated to the outputs of classifiers to generate the input features for the subsequent layer, we integrate weights on the classifiers’ outputs as augmented features to grow the deep model. The usage of weights in the feature generation process can adjust the input data of each layer, leading the better results for the deep model. We encode the weights using variable-length encoding and develop a variable-length Particle Swarm Optimization method to search for the optimal values of the weights by maximizing the classification accuracy on the validation data. Experiments on a number of UCI datasets confirm the benefit of the proposed method compared to some well-known benchmark algorithms.
A data-driven method for online transient stability monitoring with vision-transformer networks
2023, International Journal of Electrical Power and Energy Systems
Online transient stability assessment (TSA) plays an important role in power system planning and operation. The massive integration of renewable energy sources into the grids increases the risks of relay malfunction; however, the existing TSA approaches only apply to scenarios where the fault could be successfully cleared, and are not applicable under relay failure scenarios. Besides, the post-fault TSA methods normally rely on the accurate fault clearance information, which would be affected by the clock synchronization errors in relays. Thus, this paper proposes an online monitoring system that assesses transient stability based on the current operating condition, which is independent of the accurate moments of fault occurrence and clearance provided by fault indicators or relays signals; the transient stability prediction during the fault duration provides the system operators with an early warning of system instability in the event of relay failures. Moreover, an adaptively adjusted criterion for instability is proposed to balance the false alarm and misclassification rates. Furthermore, novel Vision-Transformer-based models for both TSA and unstable generators identification are built in this paper. Compared with other networks, the self-attention structure in the Transformer leads to a global receptive field and higher resolution for capturing time-series information in each sample. Our experiments on the IEEE-39 system show that the proposed method can assess transient stability at each moment in different operation states. It outperforms existing machine learning-based models in terms of accuracy and robustness in TSA and unstable generators identification tasks.
Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization
2023, Pattern Recognition Letters
Aiming at reducing data dimensionality, feature selection (FS) could improve the accuracy and reduce computational cost of machine learning model, especially those with high-dimensional text datasets. To improve the robustness, ensemble feature selection (EFS) has been developed with considerable attention recently where different aggregation methods are applied. This paper proposed a four-stage EFS method called re-ranking and TOPSIS-based ensemble feature selection (RTEFS). In the first stage of RTEFS, features are extracted from the text corpus. The second one is to construct a union subset yielded by six filter-based FS methods out of preprocessing feature vectors. Then a re-ranking stage is applied to evaluate those features from such subset. The TOPSIS method is used to aggregate the ranking lists ranked by two FS groups in the ensemble feature ranking stage. In the final stage, the two fused rankings are ensembled via a multi-objective genetic algorithm NSGA-III. To demonstrate the superiority of the proposed method, experiments are performed using 20-Newsgroups and Reuters-21,578 datasets with Support Vector Machine and K-Nearest Neighbors classifiers. Results show that RTEFS produces higher accuracy and F-measure scores over the base counterparts.

View all citing articles on Scopus

Anh Vu Luong is currently a Ph.D. student at the School of Information & Communication Technology, Griffith University, Australia. His research interest is in the field of machine learning and pattern recognition.

Manh Truong Dang is currently a Research Assistant at the School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, Scotland, UK. His research interest is in the field of machine learning and pattern recognition.

Alan Wee-Chung Liew is currently an Associate Professor at the School of Information & Communication Technology, Griffith University, Australia. His research interest is in the field of machine learning, pattern recognition, computer vision, medical imaging, and bioinformatics. He has served on the technical program committee of many international conferences and is on the editorial board of several journals, including the IEEE Transactions on Fuzzy Systems. He is a senior member of the IEEE since 2005.

John McCall is currently a Professor at the School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, Scotland, UK. His research interest is in the area of naturally-inspired computing and their application to real-world problems arising in complex engineering and medical/biological systems.

View full text

Ensemble Selection based on Classifier Prediction Confidence

Highlights

Abstract

Introduction

Section snippets

Ensemble methods

Problem formulation

Experimental datasets

Different entropy formulations

Conclusions

Declaration of Competing Interest

Appl. Soft Comput.

Inf. Sci.

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Knowl. Based Syst.

Pattern Recognit.

Inf. Sci.

Eng. Appl. Artif. Intell.

Pattern Recognit.

Inf. Fusion

Expert Syst. Appl.

Pattern Recognit.

Inf. Fusion

Pattern Recognit.

Appl. Math. Comput.

Pattern Recognit. Lett.