当前位置: X-MOL 学术Appl. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
Applied Intelligence ( IF 5.3 ) Pub Date : 2020-03-10 , DOI: 10.1007/s10489-020-01644-0
Ruonan Ren , Youlong Yang , Liqin Sun

Class imbalance problem poses a difficulty to learning algorithms in pattern classification. Oversampling techniques is one of the most widely used techniques to solve these problems, but the majority of them use the sample size ratio as an imbalanced standard. This paper proposes a fuzzy representativeness difference-based oversampling technique, using affinity propagation and the chromosome theory of inheritance (FRDOAC). The fuzzy representativeness difference (FRD) is adopted as a new imbalance metric, which focuses on the importance of samples rather than the number. FRDOAC firstly finds the representative samples of each class according to affinity propagation. Secondly, fuzzy representativeness of every sample is calculated by the Mahalanobis distance. Finally, synthetic positive samples are generated by the chromosome theory of inheritance until the fuzzy representativeness difference of two classes is small. A thorough experimental study on 16 benchmark datasets was performed and the results show that our method is better than other advanced imbalanced classification algorithms in terms of various evaluation metrics.



中文翻译:

基于模糊代表性差异的过采样技术对不平衡数据进行分类

类不平衡问题给学习模式分类中的算法带来了困难。过采样技术是解决这些问题的最广泛使用的技术之一,但是大多数将采样大小比率用作不平衡标准。提出了一种利用亲和度传播和染色体遗传理论(FRDOAC)的基于模糊代表性差异的过采样技术。模糊代表性差异(FRD)被用作一种新的不平衡度量标准,该度量标准着重于样本的重要性而不是数量。FRDOAC首先根据亲和力传播找到每个类别的代表性样本。其次,通过马氏距离计算每个样本的模糊代表度。最后,合成的阳性样本是通过遗传的染色体理论生成的,直到两类的模糊代表性差异很小为止。对16个基准数据集进行了彻底的实验研究,结果表明,在各种评估指标方面,我们的方法优于其他高级不平衡分类算法。

更新日期:2020-03-10
down
wechat
bug