Kappa Updated Ensemble for drifting data stream mining,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Kappa Updated Ensemble for drifting data stream mining
Machine Learning ( IF 4.3 ) Pub Date : 2019-10-02 , DOI: 10.1007/s10994-019-05840-z
Alberto Cano , Bartosz Krawczyk

Learning from data streams in the presence of concept drift is among the biggest challenges of contemporary machine learning. Algorithms designed for such scenarios must take into an account the potentially unbounded size of data, its constantly changing nature, and the requirement for real-time processing. Ensemble approaches for data stream mining have gained significant popularity, due to their high predictive capabilities and effective mechanisms for alleviating concept drift. In this paper, we propose a new ensemble method named Kappa Updated Ensemble (KUE). It is a combination of online and block-based ensemble approaches that uses Kappa statistic for dynamic weighting and selection of base classifiers. In order to achieve a higher diversity among base learners, each of them is trained using a different subset of features and updated with new instances with given probability following a Poisson distribution. Furthermore, we update the ensemble with new classifiers only when they contribute positively to the improvement of the quality of the ensemble. Finally, each base classifier in KUE is capable of abstaining itself for taking a part in voting, thus increasing the overall robustness of KUE. An extensive experimental study shows that KUE is capable of outperforming state-of-the-art ensembles on standard and imbalanced drifting data streams while having a low computational complexity. Moreover, we analyze the use of Kappa versus accuracy to drive the criterion to select and update the classifiers, the contribution of the abstaining mechanism, the contribution of the diversification of classifiers, and the contribution of the hybrid architecture to update the classifiers in an online manner.

中文翻译：

用于漂移数据流挖掘的 Kappa 更新集成

在存在概念漂移的情况下从数据流中学习是当代机器学习的最大挑战之一。为此类场景设计的算法必须考虑到数据的潜在无限大小、其不断变化的性质以及实时处理的要求。数据流挖掘的集成方法因其高预测能力和缓解概念漂移的有效机制而广受欢迎。在本文中，我们提出了一种名为 Kappa Updated Ensemble (KUE) 的新集成方法。它是在线和基于块的集成方法的组合，使用 Kappa 统计进行动态加权和基分类器的选择。为了在基学习器之间实现更高的多样性，它们中的每一个都使用不同的特征子集进行训练，并使用遵循泊松分布的给定概率更新新实例。此外，我们仅在新分类器对集成质量的提高做出积极贡献时才使用新分类器更新集成。最后，KUE 中的每个基分类器都可以弃权参与投票，从而提高了 KUE 的整体鲁棒性。一项广泛的实验研究表明，KUE 能够在标准和不平衡漂移数据流上优于最先进的集成，同时具有较低的计算复杂度。此外，我们分析了使用 Kappa 与准确性来驱动选择和更新分类器的标准，弃权机制的贡献，分类器多样化的贡献，

更新日期：2019-10-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11