Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms,Entropy

当前位置： X-MOL 学术 › Entropy › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
Entropy ( IF 2.7 ) Pub Date : 2020-07-31 , DOI: 10.3390/e22080849
Weronika Wegier , Pawel Ksieniewicz

In the era of a large number of tools and applications that constantly produce massive amounts of data, their processing and proper classification is becoming both increasingly hard and important. This task is hindered by changing the distribution of data over time, called the concept drift, and the emergence of a problem of disproportion between classes—such as in the detection of network attacks or fraud detection problems. In the following work, we propose methods to modify existing stream processing solutions—Accuracy Weighted Ensemble (AWE) and Accuracy Updated Ensemble (AUE), which have demonstrated their effectiveness in adapting to time-varying class distribution. The introduced changes are aimed at increasing their quality on binary classification of imbalanced data. The proposed modifications contain the inclusion of aggregate metrics, such as F1-score, G-mean and balanced accuracy score in calculation of the member classifiers weights, which affects their composition and final prediction. Moreover, the impact of data sampling on the algorithm’s effectiveness was also checked. Complex experiments were conducted to define the most promising modification type, as well as to compare proposed methods with existing solutions. Experimental evaluation shows an improvement in the quality of classification compared to the underlying algorithms and other solutions for processing imbalanced data streams.

中文翻译：

不平衡数据分类质量度量作为集成数据流分类算法加权方法的应用

在大量工具和应用程序不断产生海量数据的时代，它们的处理和正确分类变得越来越困难和重要。随着时间的推移改变数据分布（称为概念漂移），以及出现类之间不成比例的问题，例如网络攻击检测或欺诈检测问题，这项任务会受到阻碍。在接下来的工作中，我们提出了修改现有流处理解决方案的方法——准确度加权集成（AWE）和准确度更新集成（AUE），它们已经证明了它们在适应时变类分布方面的有效性。引入的更改旨在提高其对不平衡数据的二元分类的质量。提议的修改包含包含汇总指标，例如 F1-score、G-mean 和平衡精度分数在计算成员分类器的权重时会影响它们的组成和最终预测。此外，还检查了数据采样对算法有效性的影响。进行了复杂的实验以定义最有希望的修改类型，并将提出的方法与现有解决方案进行比较。实验评估表明，与处理不平衡数据流的底层算法和其他解决方案相比，分类质量有所提高。进行了复杂的实验以定义最有希望的修改类型，并将提出的方法与现有解决方案进行比较。实验评估表明，与处理不平衡数据流的底层算法和其他解决方案相比，分类质量有所提高。进行了复杂的实验以定义最有希望的修改类型，并将提出的方法与现有解决方案进行比较。实验评估表明，与处理不平衡数据流的底层算法和其他解决方案相比，分类质量有所提高。

更新日期：2020-07-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>