当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic ensemble feature selection using fast non-dominated sorting
Information Systems ( IF 3.0 ) Pub Date : 2021-03-12 , DOI: 10.1016/j.is.2021.101760
Sedighe Abasabadi , Hossein Nematzadeh , Homayun Motameni , Ebrahim Akbari

Feature selection refers to selecting optimal feature subset with effective data preprocessing policy in making high dimensional data for diverse pattern recognition problems. The aims of feature selection are enhancing accuracy, improving the evaluation performance, and finding the smallest effective feature subset. In this study, ensemble feature selection method is adopted based on an assumption indicating that a combination of several feature selection methods obtains more robust results than any individual feature selection method. Accordingly, when carrying out ensemble feature selection, a combinational method should be used to combine rankings of features from diverse algorithms into an individual rank for each feature. It is also required to set a threshold to acquire a functional subset of features. In this work, a three-step ensemble feature selection technique called Automatic Thresholding Feature Selection (ATFS) is proposed. The first step involves diversity generation where multiple rankers are applied to each dataset to generate different feature rankings. Second, output rankings of individual selectors are combined using fast non-dominated sorting that is a combinational method empowering the proposed ensemble with automatic thresholding capability. Third, feature sets are generated to obtain the optimal feature set. Additionally, a new filter method called Sorted Label Interference (SLI) is proposed based on interference between class labels. Both SLI and ATFS are applicable to binary datasets. The performance of SLI and ATFS is at least comparable and often better than the performance of individual rankers and existing ensemble methods. The obtained results also show that the use of ATFS-generated threshold improves not only the performance of ATFS and SLI, but also the performance of other filters and combinational methods.



中文翻译:

使用快速非支配排序自动选择集成特征

特征选择是指在制作高维数据以解决各种模式识别问题时,采用有效的数据预处理策略选择最佳特征子集。特征选择的目的是提高准确性,改善评估性能并找到最小的有效特征子集。在这项研究中,采用集成特征选择方法是基于这样一个假设,即表明几种特征选择方法的组合比任何单独的特征选择方法都能获得更可靠的结果。因此,当进行整体特征选择时,应使用组合方法将来自各种算法的特征等级组合为每个特征的单独等级。还需要设置阈值以获取特征的功能子集。在这项工作中,提出了一种称为自动阈值特征选择(ATFS)的三步合奏特征选择技术。第一步涉及多样性生成,其中将多个等级应用于每个数据集以生成不同的特征等级。其次,使用快速非支配排序将各个选择器的输出排名进行组合,该排序方法是一种组合方法,可以使所提出的集合具有自动阈值功能。第三,生成特征集以获得最佳特征集。此外,基于类别标签之间的干扰,提出了一种新的过滤方法,称为分类标签干扰(SLI)。SLI和ATFS均适用于二进制数据集。SLI和ATFS的性能至少是可比的,并且通常优于单个排序器和现有集成方法的性能。

更新日期:2021-03-19
down
wechat
bug