当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cooperative co-evolution for feature selection in Big Data with random feature grouping
Journal of Big Data ( IF 8.1 ) Pub Date : 2020-12-04 , DOI: 10.1186/s40537-020-00381-y
A. N. M. Bazlur Rashid , Mohiuddin Ahmed , Leslie F. Sikos , Paul Haskell-Dowland

A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity.



中文翻译:

具有随机特征分组的大数据特征选择合作协同进化

随着现代技术的发展,产生了大量的数据。这种高吞吐量的数据生成会产生大数据,其中包括许多功能(属性)。但是,不相关的功能可能会降低机器学习(ML)算法的分类性能。特征选择(FS)是一种用于选择代表数据集的相关特征子集的技术。进化算法(EA)是该领域中广泛使用的搜索策略。EA的一种变体,称为合作协同进化(CC),使用分而治之,是解决优化问题的理想选择。现有的解决方案由于某些限制(例如不考虑要素交互,仅处理偶数个要素以及静态分解数据集)而导致性能不佳。在本文中,引入了一种新颖的随机特征分组(RFG)及其三种变体,以动态分解大数据数据集并确保将交互特征分组到同一子组件中的可能性。RFG可用于基于CC的FS流程,因此称为基于协作基于进化的特征选择和随机特征分组(CCFSRFG)。使用六个广泛使用的ML分类器,对带有和不带有FS的UCI ML存储库和普林斯顿大学基因组存储库的七个不同数据集进行了实验分析。实验结果表明,在大多数情况下(例如,在朴素贝叶斯(NB),支持向量机(SVM),k最近邻(k -NN),J48和随机森林(RF)的情况下,建议的CCFSRFG-1优于现有解决方案(基于CC的FS,称为CCEAFS)和CCFSRFG-2,以及使用所有功能的准确性,灵敏性和特异性时。

更新日期:2020-12-04
down
wechat
bug