当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel community detection based genetic algorithm for feature selection
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-01-04 , DOI: 10.1186/s40537-020-00398-3
Mehrdad Rostami , Kamal Berahmand , Saman Forouzandeh

The feature selection is an essential data preprocessing stage in data mining. The core principle of feature selection seems to be to pick a subset of possible features by excluding features with almost no predictive information as well as highly associated redundant features. In the past several years, a variety of meta-heuristic methods were introduced to eliminate redundant and irrelevant features as much as possible from high-dimensional datasets. Among the main disadvantages of present meta-heuristic based approaches is that they are often neglecting the correlation between a set of selected features. In this article, for the purpose of feature selection, the authors propose a genetic algorithm based on community detection, which functions in three steps. The feature similarities are calculated in the first step. The features are classified by community detection algorithms into clusters throughout the second step. In the third step, features are picked by a genetic algorithm with a new community-based repair operation. Nine benchmark classification problems were analyzed in terms of the performance of the presented approach. Also, the authors have compared the efficiency of the proposed approach with the findings from four available algorithms for feature selection. Comparing the performance of the proposed method with three new feature selection methods based on PSO, ACO, and ABC algorithms on three classifiers showed that the accuracy of the proposed method is on average 0.52% higher than the PSO, 1.20% higher than ACO, and 1.57 higher than the ABC algorithm.



中文翻译:

一种新颖的基于社区检测的遗传算法进行特征选择

特征选择是数据挖掘中必不可少的数据预处理阶段。特征选择的核心原则似乎是通过排除几乎没有预测信息的特征以及高度相关的冗余特征来选择可能特征的子集。在过去的几年中,引入了多种元启发式方法来尽可能地从高维数据集中消除冗余和不相关的特征。当前基于元启发式方法的主要缺点之一是,它们通常忽略了一组选定特征之间的相关性。在本文中,出于特征选择的目的,作者提出了一种基于社区检测的遗传算法,该算法分为三个步骤。在第一步中计算特征相似度。在第二步中,社区检测算法将这些功能分为几类。在第三步中,通过具有新的基于社区的修复操作的遗传算法来选择特征。根据提出的方法的性能分析了九个基准分类问题。此外,作者还比较了所提出方法的效率与四种可用特征选择算法的发现。将所提方法的性能与在三个分类器上基于PSO,ACO和ABC算法的三种新特征选择方法的性能进行比较表明,所提方法的准确度平均比PSO高0.52%,比ACO高1.20%,并且比ABC算法高1.57。遗传算法采用新的基于社区的修复操作来选择特征。根据提出的方法的性能分析了九个基准分类问题。同样,作者将提出的方法的效率与四种可用的特征选择算法的发现进行了比较。将所提方法的性能与在三个分类器上基于PSO,ACO和ABC算法的三种新特征选择方法的性能进行比较表明,所提方法的准确度平均比PSO高0.52%,比ACO高1.20%,并且比ABC算法高1.57。遗传算法采用新的基于社区的修复操作来选择特征。根据提出的方法的性能分析了九个基准分类问题。同样,作者将提出的方法的效率与四种可用的特征选择算法的发现进行了比较。将所提方法的性能与在三个分类器上基于PSO,ACO和ABC算法的三种新特征选择方法的性能进行比较表明,所提方法的准确度平均比PSO高0.52%,比ACO高1.20%,并且比ABC算法高1.57。作者将提议的方法的效率与四种可用特征选择算法的发现进行了比较。将所提方法的性能与在三个分类器上基于PSO,ACO和ABC算法的三种新特征选择方法的性能进行比较表明,所提方法的准确度平均比PSO高0.52%,比ACO高1.20%,并且比ABC算法高1.57。作者将提议的方法的效率与四种可用特征选择算法的发现进行了比较。将所提方法的性能与在三个分类器上基于PSO,ACO和ABC算法的三种新特征选择方法的性能进行比较表明,所提方法的准确度平均比PSO高0.52%,比ACO高1.20%,并且比ABC算法高1.57。

更新日期:2021-01-04
down
wechat
bug