当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heterogeneous Ensemble Selection for Evolving Data Streams
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-04-01 , DOI: 10.1016/j.patcog.2020.107743
Anh Vu Luong , Tien Thanh Nguyen , Alan Wee-Chung Liew , Shilin Wang

Abstract Ensemble learning has been widely applied to both batch data classification and streaming data classification. For the latter setting, most existing ensemble systems are homogenous, which means they are generated from only one type of learning model. In contrast, by combining several types of different learning models, a heterogeneous ensemble system can achieve greater diversity among its members, which helps to improve its performance. Although heterogeneous ensemble systems have achieved many successes in the batch classification setting, it is not trivial to extend them directly to the data stream setting. In this study, we propose a novel HEterogeneous Ensemble Selection (HEES) method, which dynamically selects an appropriate subset of base classifiers to predict data under the stream setting. We are inspired by the observation that a well-chosen subset of good base classifiers may outperform the whole ensemble system. Here, we define a good candidate as one that expresses not only high predictive performance but also high confidence in its prediction. Our selection process is thus divided into two sub-processes: accurate-candidate selection and confident-candidate selection. We define an accurate candidate in the stream context as a base classifier with high accuracy over the current concept, while a confident candidate as one with a confidence score higher than a certain threshold. In the first sub-process, we employ the prequential accuracy to estimate the performance of a base classifier at a specific time, while in the latter sub-process, we propose a new measure to quantify the predictive confidence and provide a method to learn the threshold incrementally. The final ensemble is formed by taking the intersection of the sets of confident classifiers and accurate classifiers. Experiments on a wide range of data streams show that the proposed method achieves competitive performance with lower running time in comparison to the state-of-the-art online ensemble methods.

中文翻译:

进化数据流的异构集成选择

摘要 集成学习已广泛应用于批量数据分类和流数据分类。对于后一种设置,大多数现有的集成系统都是同构的,这意味着它们仅从一种类型的学习模型生成。相比之下,通过组合几种不同的学习模型,异构集成系统可以在其成员之间实现更大的多样性,这有助于提高其性能。尽管异构集成系统在批量分类设置中取得了许多成功,但将它们直接扩展到数据流设置并非易事。在这项研究中,我们提出了一种新的异质集成选择(HEES)方法,该方法动态选择基分类器的适当子集来预测流设置下的数据。我们受到以下观察的启发:精心挑选的良好基分类器子集可能优于整个集成系统。在这里,我们将一个好的候选者定义为不仅表现出高预测性能而且在其预测中表现出高可信度的候选者。因此,我们的选择过程分为两个子过程:准确的候选人选择和自信的候选人选择。我们将流上下文中的准确候选者定义为对当前概念具有高准确度的基分类器,而置信度候选者定义为置信度得分高于某个阈值的候选者。在第一个子过程中,我们使用 prequential 准确度来估计特定时间的基分类器的性能,而在后一个子过程中,我们提出了一种量化预测置信度的新方法,并提供了一种逐步学习阈值的方法。最终的集成是通过取置信分类器和准确分类器集的交集而形成的。对各种数据流的实验表明,与最先进的在线集成方法相比,所提出的方法以更低的运行时间实现了具有竞争力的性能。
更新日期:2021-04-01
down
wechat
bug