当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2021-07-30 , DOI: 10.1007/s40747-021-00452-4
Chaonan Shen 1 , Kai Zhang 2
Affiliation  

In recent years, evolutionary algorithms have shown great advantages in the field of feature selection because of their simplicity and potential global search capability. However, most of the existing feature selection algorithms based on evolutionary computation are wrapper methods, which are computationally expensive, especially for high-dimensional biomedical data. To significantly reduce the computational cost, it is essential to study an effective evaluation method. In this paper, a two-stage improved gray wolf optimization (IGWO) algorithm for feature selection on high-dimensional data is proposed. In the first stage, a multilayer perceptron (MLP) network with group lasso regularization terms is first trained to construct an integer optimization problem using the proposed algorithm for pre-selection of features and optimization of the hidden layer structure. The dataset is compressed using the feature subset obtained in the first stage. In the second stage, a multilayer perceptron network with group lasso regularization terms is retrained using the compressed dataset, and the proposed algorithm is employed to construct the discrete optimization problem for feature selection. Meanwhile, a rapid evaluation strategy is constructed to mitigate the evaluation cost and improve the evaluation efficiency in the feature selection process. The effectiveness of the algorithm was analyzed on ten gene expression datasets. The experimental results show that the proposed algorithm not only removes almost more than 95.7% of the features in all datasets, but also has better classification accuracy on the test set. In addition, the advantages of the proposed algorithm in terms of time consumption, classification accuracy and feature subset size become more and more prominent as the dimensionality of the feature selection problem increases. This indicates that the proposed algorithm is particularly suitable for solving high-dimensional feature selection problems.



中文翻译:

高维分类特征选择的两阶段改进灰狼优化算法

近年来,进化算法由于其简单性和潜在的全局搜索能力,在特征选择领域显示出巨大的优势。然而,现有的基于进化计算的特征选择算法大多是包装方法,计算成本高,特别是对于高维生物医学数据。为了显着降低计算成本,必须研究一种有效的评估方法。在本文中,提出了一种用于高维数据特征选择的两阶段改进灰狼优化(IGWO)算法。在第一阶段,首先训练具有组套索正则化项的多层感知器 (MLP) 网络,以使用所提出的算法来构建整数优化问题,以进行特征预选和隐藏层结构的优化。使用第一阶段获得的特征子集压缩数据集。在第二阶段,使用压缩数据集重新训练具有组套索正则化项的多层感知器网络,并采用所提出的算法构建特征选择的离散优化问题。同时,构建了快速评估策略,以降低特征选择过程中的评估成本,提高评估效率。在十个基因表达数据集上分析了该算法的有效性。实验结果表明,该算法不仅去除了所有数据集中几乎95.7%以上的特征,而且在测试集上具有更好的分类准确率。此外,随着特征选择问题维数的增加,该算法在时间消耗、分类精度和特征子集大小等方面的优势越来越突出。这表明所提出的算法特别适合解决高维特征选择问题。随着特征选择问题维数的增加,分类精度和特征子集大小变得越来越突出。这表明所提出的算法特别适合解决高维特征选择问题。随着特征选择问题维数的增加,分类精度和特征子集大小变得越来越突出。这表明所提出的算法特别适合解决高维特征选择问题。

更新日期:2021-07-30
down
wechat
bug