当前位置: X-MOL 学术Appl. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An effective distance based feature selection approach for imbalanced data
Applied Intelligence ( IF 5.3 ) Pub Date : 2019-08-27 , DOI: 10.1007/s10489-019-01543-z
Shaukat Ali Shahee , Usha Ananthakumar

Abstract

Class imbalance is one of the critical areas in classification. The challenges become more severe when the data set has a large number of features. Traditional classifiers generally favour the majority class because of skewed class distributions. In recent years, feature selection is being used to select the appropriate features for better classification of minority class. However, these studies are limited to imbalance that arise between the classes. In addition to between class imbalance, within class imbalance, along with large number of features, adds additional complexity and results in poor performance of the classifier. In the current study, we propose an effective distance based feature selection method (ED-Relief) that uses a sophisticated distance measure, in order to tackle simultaneous occurrence of between and within class imbalance. This method has been tested on a variety of simulated experiments and real life data sets and the results are compared with the traditional Relief method and some of the well known recent distance based feature selection methods. The results clearly show the superiority of the proposed effective distance based feature selection method.



中文翻译:

一种有效的基于距离的不平衡数据特征选择方法

摘要

类别失衡是分类中的关键领域之一。当数据集具有大量功能时,挑战将变得更加严峻。由于分类分布偏斜,传统分类器通常倾向于多数分类。近年来,正在使用特征选择来选择适当的特征,以更好地对少数民族类别进行分类。但是,这些研究仅限于各班级之间出现的失衡。除了类之间的不平衡之外,类内部的不平衡以及大量功能会增加额外的复杂性,并导致分类器的性能不佳。在当前的研究中,我们提出了一种有效的基于距离的特征选择方法(ED-Relief),该方法使用了复杂的距离度量,以解决班级之间和班级内部不平衡的同时发生。该方法已在各种模拟实验和现实生活的数据集上进行了测试,并将结果与​​传统的Relief方法以及一些基于最近距离的已知特征选择方法进行了比较。结果清楚地表明了所提出的基于有效距离的特征选择方法的优越性。

更新日期:2020-02-19
down
wechat
bug