当前位置: X-MOL 学术Adv. Theory Simul. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Imbalanced Learning with Oversampling based on Classification Contribution Degree
Advanced Theory and Simulations ( IF 2.9 ) Pub Date : 2021-03-26 , DOI: 10.1002/adts.202100031
Zhenhao Jiang 1 , Jie Yang 2 , Yan Liu 3
Affiliation  

Imbalanced datasets exist commonly in the real world, which leads to poor performance of general machine learning models because of skewed class distribution. To address the data‐imbalance problem, a novel oversampling method based on classification contribution degree, called OS‐CCD is presented. First a new concept, classification contribution degree, is established based on micro and macro information extracted from raw datasets. With the classification contribution degree, OS‐CCD enables positive samples near the class boundary and located in an area with high density of positive samples to generate more synthetic samples than others. Furthermore, the neighbor selection for oversampling is no longer random but in the light of a selected probability. Experimental results on 12 benchmark datasets substantiate that four commonly used classifiers with the oversampling method outperform those with six popular oversampling methods in terms of accuracy, F1‐score and AUC.

中文翻译:

基于分类贡献度的过采样不均衡学习

现实世界中普遍存在不平衡的数据集,由于类分布偏斜,导致通用机器学习模型的性能不佳。为了解决数据不平衡问题,提出了一种基于分类贡献度的过采样方法,即OS-CCD。首先,基于从原始数据集中提取的微观和宏观信息,建立了一个新的概念,即分类贡献度。通过分类贡献度,OS‐CCD可以使正样本位于类边界附近,并且位于正样本密度较高的区域中,从而生成比其他样本更多的合成样本。此外,用于过采样的邻居选择不再是随机的,而是根据选择的概率。
更新日期:2021-05-05
down
wechat
bug