Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift,ACM Transactions on Knowledge Discovery from Data

当前位置： X-MOL 学术 › ACM Trans. Knowl. Discov. Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2021-07-21 , DOI: 10.1145/3466616
Juan I. G. Hidalgo ₁ , Silas G. T. C. Santos ₂ , Roberto S. M. Barros ₂

Affiliation

A data stream can be defined as a system that continually generates a lot of data over time. Today, processing data streams requires new demands and challenging tasks in the data mining and machine learning areas. Concept Drift is a problem commonly characterized as changes in the distribution of the data within a data stream. The implementation of new methods for dealing with data streams where concept drifts occur requires algorithms that can adapt to several scenarios to improve its performance in the different experimental situations where they are tested. This research proposes a strategy for dynamic parameter adjustment in the presence of concept drifts. Parameter Estimation Procedure (PEP) is a general method proposed for dynamically adjusting parameters which is applied to the diversity parameter (λ) of several classification ensembles commonly used in the area. To this end, the proposed estimation method (PEP) was used to create Boosting-like Online Learning Ensemble with Parameter Estimation (BOLE-PE), Online AdaBoost-based M1 with Parameter Estimation (OABM1-PE), and Oza and Russell’s Online Bagging with Parameter Estimation (OzaBag-PE), based on the existing ensembles BOLE, OABM1, and OzaBag, respectively. To validate them, experiments were performed with artificial and real-world datasets using Hoeffding Tree (HT) as base classifier. The accuracy results were statistically evaluated using a variation of the Friedman test and the Nemenyi post-hoc test. The experimental results showed that the application of the dynamic estimation in the diversity parameter (λ) produced good results in most scenarios, i.e., the modified methods have improved accuracy in the experiments with both artificial and real-world datasets.

中文翻译：

动态调整集合中的多样性以对具有概念漂移的数据流进行分类

数据流可以定义为随着时间不断生成大量数据的系统。如今，处理数据流需要数据挖掘和机器学习领域的新需求和具有挑战性的任务。概念漂移是一个通常被描述为数据流中数据分布变化的问题。处理发生概念漂移的数据流的新方法的实施需要能够适应多种场景的算法，以提高其在测试它们的不同实验情况下的性能。本研究提出了一种在存在概念漂移的情况下进行动态参数调整的策略。参数估计程序（PEP）是一种用于动态调整参数的通用方法，适用于该领域常用的几种分类集成的多样性参数（λ）。为此，所提出的估计方法（PEP）被用于创建具有参数估计（BOLE-PE）的类似Boosting的在线学习集成，Online AdaBoost-based M1 with Parameter Estimation (OABM1-PE)，以及 Oza 和 Russell 的 Online Bagging with Parameter Estimation (OzaBag-PE)，分别基于现有的集成 BOLE、OABM1 和 OzaBag。为了验证它们，使用霍夫丁树（HT）作为基础分类器对人工和真实世界的数据集进行了实验。使用 Friedman 检验和 Nemenyi 事后检验的变体对准确性结果进行统计评估。实验结果表明，动态估计在多样性参数（λ）中的应用在大多数情况下都产生了良好的效果，即修改后的方法在人工和现实世界数据集的实验中都提高了准确性。和 OzaBag，分别。为了验证它们，使用霍夫丁树（HT）作为基础分类器对人工和真实世界的数据集进行了实验。使用 Friedman 检验和 Nemenyi 事后检验的变体对准确性结果进行统计评估。实验结果表明，动态估计在多样性参数（λ）中的应用在大多数情况下都产生了良好的效果，即修改后的方法在人工和现实世界数据集的实验中都提高了准确性。和 OzaBag，分别。为了验证它们，使用霍夫丁树（HT）作为基础分类器对人工和真实世界的数据集进行了实验。使用 Friedman 检验和 Nemenyi 事后检验的变体对准确性结果进行统计评估。实验结果表明，动态估计在多样性参数（λ）中的应用在大多数情况下都产生了良好的效果，即修改后的方法在人工和现实世界数据集的实验中都提高了准确性。

更新日期：2021-07-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11