当前位置: X-MOL 学术Swarm Evol. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised feature selection based on bio-inspired approaches
Swarm and Evolutionary Computation ( IF 8.2 ) Pub Date : 2019-11-11 , DOI: 10.1016/j.swevo.2019.100618
Nádia Junqueira Martarelli , Marcelo Seido Nagano

In recent years, the scientific community has witnessed an explosion in the use of pattern recognition algorithms. However, little attention has been paid to the tasks preceding the execution of these algorithms, the preprocessing activities. One of these tasks is dimensionality reduction, in which a subset of features that improves the performance of the mining algorithm is located and algorithm's runtime is reduced. Although there are many methods that address the problems in pattern recognition algorithms, effective solutions still need to be researched and explored. Hence, this paper aims to address three of the issues surrounding these algorithms. First, we propose adapting a promising meta-heuristic called biased random-key genetic algorithm, which considers a random initial population construction. We call this algorithm as unsupervised feature selection by biased random-key genetic algorithm I. Next, we propose an approach for building the initial population partly in a deterministic way. Thus, we applied this idea in two algorithms, named unsupervised feature selection by particle swarm optimization and unsupervised feature selection by biased random-key genetic algorithm II. Finally, we simulated different datasets to study the effects of relevant and irrelevant attributes, and of noisy and missing data on the performance of the algorithms. After the Wilcoxon rank-sum test, we can state that the proposed algorithms outperform all other methods in different datasets. It was also observed that the construction of the initial population in a partially deterministic way contributed to the better performance. It should be noted that some methods are more sensitive to noisy and missing data than others, as well as to relevant and irrelevant attributes.



中文翻译:

基于生物启发方法的无监督特征选择

近年来,科学界见证了模式识别算法的使用激增。但是,很少有注意力放在执行这些算法之前的任务,即预处理活动。这些任务之一是降维,其中放置了可改善挖掘算法性能的功能子集,并减少了算法的运行时间。尽管有很多方法可以解决模式识别算法中的问题,但是仍然需要研究和探索有效的解决方案。因此,本文旨在解决围绕这些算法的三个问题。首先,我们建议采用一种有前途的元启发式算法,称为偏向随机密钥遗传算法,该算法考虑了随机初始种群的构造。我们称此算法为有偏随机密钥遗传算法I的无监督特征选择。接下来,我们提出一种以确定性方式部分构建初始种群的方法。因此,我们将此思想应用到了两种算法中,分别是通过粒子群优化进行的无监督特征选择和通过有偏随机密钥遗传算法II进行的无监督特征选择。最后,我们模拟了不同的数据集,以研究相关属性和不相关属性的影响,以及嘈杂数据和缺失数据对算法性能的影响。经过Wilcoxon秩和检验后,我们可以说,所提出的算法在不同数据集中的表现优于所有其他方法。还观察到,以部分确定性的方式构造初始总体有助于提高性能。

更新日期:2019-11-11
down
wechat
bug