当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning from evolving data streams through ensembles of random patches
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2021-06-09 , DOI: 10.1007/s10115-021-01579-z
Heitor Murilo Gomes , Jesse Read , Albert Bifet , Robert J. Durrant

Ensemble methods represent an effective way to solve supervised learning problems. Such methods are prevalent for learning from evolving data streams. One of the main reasons for such popularity is the possibility of incorporating concept drift detection and recovery strategies in conjunction with the ensemble algorithm. On top of that, successful ensemble strategies, such as bagging and random forest, can be easily adapted to a streaming setting. In this work, we analyse a novel ensemble method designed specially to cope with evolving data streams, namely the streaming random patches (SRP) algorithm. SRP combines random subspaces and online bagging to achieve competitive predictive performance in comparison with other methods. We significantly extend previous theoretical insights and empirical results illustrating different aspects of SRP. In particular, we explain how the widely adopted incremental Hoeffding trees are not, in fact, unstable learners, unlike their batch counterparts, and how this fact significantly influences ensemble methods design and performance. We compare SRP against state-of-the-art ensemble variants for streaming data in a multitude of datasets. The results show how SRP produces a high predictive performance for both real and synthetic datasets. We also show how ensembles of random subspaces can be an efficient and accurate option to SRP and leveraging bagging as we increase the number of base learners. Besides, we analyse the diversity over time and the average tree depth, which provides insights on the differences between local subspace randomization (as in random forest) and global subspace randomization (as in random subspaces). Finally, we analyse the behaviour of SRP when using Naive Bayes as its base learner instead of Hoeffding trees.



中文翻译:

通过随机补丁的集合从不断变化的数据流中学习

集成方法代表了解决监督学习问题的有效方法。这种方法普遍用于从不断发展的数据流中学习。这种流行的主要原因之一是可以将概念漂移检测和恢复策略与集成算法结合起来。最重要的是,成功的集成策略,如装袋和随机森林,可以很容易地适应流媒体设置。在这项工作中,我们分析了一种专为应对不断变化的数据流而设计的新型集成方法,即流式随机补丁 (SRP) 算法。SRP 结合了随机子空间和在线装袋,以实现与其他方法相比具有竞争力的预测性能。我们显着扩展了先前的理论见解和实证结果,说明了 SRP 的不同方面。特别是,我们解释了广泛采用的增量 Hoeffding 树实际上是如何不是不稳定的学习器,不像它们的批处理对应物,以及这一事实如何显着影响集成方法的设计和性能。我们将 SRP 与最先进的集成变体进行比较,以在大量数据集中流式传输数据。结果显示了 SRP 如何为真实和合成数据集产生高预测性能。我们还展示了随着我们增加基学习器的数量,随机子空间的集合如何成为 SRP 和利用 bagging 的有效和准确的选择。此外,我们分析了随时间变化的多样性和平均树深度,这提供了对局部子空间随机化(如随机森林)和全局子空间随机化(如随机子空间)之间差异的见解。最后,

更新日期:2021-06-09
down
wechat
bug