当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A weighted hybrid ensemble method for classifying imbalanced data
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-06-08 , DOI: 10.1016/j.knosys.2020.106087
Jiakun Zhao , Ju Jin , Si Chen , Ruifeng Zhang , Bilin Yu , Qingfang Liu

In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine learning, data imbalance can have adverse effects. At present, the methods to solve the problem of data imbalance can be divided into data-level methods, algorithm-level methods and hybrid methods. In this paper, we propose a weighted hybrid ensemble method for classifying imbalanced data in binary classification tasks, called WHMBoost. In the framework of the boosting algorithm, the presented method combines two data sampling methods and two base classifiers, and each sampling method and each base classifier is assigned corresponding weights, which makes them have better complementary advantages. The performance of WHMBoost has been evaluated on 40 benchmark imbalanced datasets with state of the art ensemble methods like AdaBoost, RUSBoost, SMOTEBoost using AUC, F-Measure and Geometric Mean as the performance evaluation criteria. Experimental results show significant improvement over the other methods and it can be concluded that WHMBoost is a promising and effective algorithm to deal with imbalance datasets.



中文翻译:

用于不平衡数据分类的加权混合集成方法

在实际数据集中,大多数数据是不平衡的。可以将数据不平衡定义为某些类中的实例数大大超过其他类中的实例数。无论是在数据挖掘还是机器学习领域,数据不平衡都会产生不利影响。目前,解决数据不平衡问题的方法可以分为数据级方法,算法级方法和混合方法。在本文中,我们提出了一种加权混合集成方法,用于对二进制分类任务中的不平衡数据进行分类,称为WHMBoost。在boost算法的框架下,提出的方法结合了两种数据采样方法和两种基本分类器,并为每种采样方法和每种基本分类器分配了相应的权重,使其具有更好的互补优势。WHMBoost的性能已在40个基准不平衡数据集上进行了评估,并使用AUCBoost,RUSBoost,SMOTEBoost等最新的集成方法,使用AUC,F-Measure和Geometric Mean作为性能评估标准。实验结果表明,与其他方法相比,WHMBoost是一种很有前途且有效的算法,可以处理不平衡数据集。

更新日期:2020-06-08
down
wechat
bug