当前位置: X-MOL 学术Appl. Mathmat. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
New feature selection paradigm based on hyper-heuristic technique
Applied Mathematical Modelling ( IF 5 ) Pub Date : 2021-05-06 , DOI: 10.1016/j.apm.2021.04.018
Rehab Ali Ibrahim , Mohamed Abd Elaziz , Ahmed A. Ewees , Mohammed El-Abd , Songfeng Lu

Feature selection (FS) is a crucial step for effective data mining since it has largest effect on improving the performance of classifiers. This is achieved by removing the irrelevant features and using only the relevant features. Many metaheuristic approaches exist in the literature in attempt to address this problem. The performance of these approaches differ based on the settings of a number of factors including the use of chaotic maps, opposition-based learning (OBL) and the percentage of the population that OBL will be applied to, the metaheuristic (MH) algorithm adopted, the classifier utilized, and the threshold value used to convert real solutions to binary ones. However, it is not an easy task to identify the best settings for these different components in order to determine the relevant features for a specific dataset. Moreover, running extensive experiments to fine tune these settings for each and every dataset will consume considerable time. In order to mitigate this important issue, a hyper-heuristic based FS paradigm is proposed. In the proposed model, a two-stage approach is adopted to identify the best combination of these components. In the first stage, referred to as the training stage, the Differential Evolution (DE) algorithm is used as a controller for selecting the best combination of components to be used by the second stage. In the second stage, referred to as the testing stage, the received combination will be evaluated using a testing set. Empirical evaluation of the proposed framework is based on numerous experiments performed on the most popular 18 datasets from the UCI machine learning repository. Experimental results illustrates that the generated generic configuration provides a better performance than eight other metaheuristic algorithms over all performance measures when applied to the UCI dataset. Moreover, The overall paradigm ranks at number one when compared against state-of-the-art algorithms. Finally, the generic configuration provides a very competitive performance for high dimensional datasets.



中文翻译:

基于超启发式技术的新特征选择范例

特征选择(FS)是有效数据挖掘的关键步骤,因为它对提高分类器的性能影响最大。这是通过删除不相关的功能并仅使用相关的功能来实现的。为了解决这个问题,文献中存在许多元启发式方法。根据许多因素的设置,这些方法的效果会有所不同,这些因素包括使用混沌图,基于对立的学习(OBL)和将要应用OBL的总体百分比,采用的元启发式(MH)算法,使用的分类器,以及用于将实际解转换为二进制解的阈值。但是,要确定这些不同组件的最佳设置以确定特定数据集的相关特征,并非易事。而且,进行大量实验以微调每个数据集的这些设置将消耗大量时间。为了减轻这个重要问题,提出了一种基于超启发式的FS范式。在提出的模型中,采用了两阶段方法来确定这些组件的最佳组合。在第一阶段,称为在训练阶段,使用差分进化(DE)算法作为控制器,以选择第二阶段要使用的组件的最佳组合。在第二阶段,称为测试阶段,则将使用测试集对收到的组合进行评估。对提出的框架进行的经验评估是基于对UCI机器学习存储库中最流行的18个数据集进行的大量实验。实验结果表明,在应用于UCI数据集时,在所有性能指标上,生成的通用配置均比其他八种启发式算法提供更好的性能。此外,与最先进的算法相比,总体范例排名第一。最后,通用配置为高维数据集提供了非常有竞争力的性能。

更新日期:2021-05-25
down
wechat
bug