当前位置: X-MOL 学术Expert Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results
Expert Systems ( IF 3.0 ) Pub Date : 2020-04-03 , DOI: 10.1111/exsy.12553
Chih‐Wen Chen, Yi‐Hong Tsai, Fang‐Rong Chang, Wei‐Chao Lin

Feature selection is a process aimed at filtering out unrepresentative features from a given dataset, usually allowing the later data mining and analysis steps to produce better results. However, different feature selection algorithms use different criteria to select representative features, making it difficult to find the best algorithm for different domain datasets. The limitations of single feature selection methods can be overcome by the application of ensemble methods, combining multiple feature selection results. In the literature, feature selection algorithms are classified as filter, wrapper, or embedded techniques. However, to the best of our knowledge, there has been no study focusing on combining these three types of techniques to produce ensemble feature selection. Therefore, the aim here is to answer the question as to which combination of different types of feature selection algorithms offers the best performance for different types of medical data including categorical, numerical, and mixed data types. The experimental results show that a combination of filter (i.e., principal component analysis) and wrapper (i.e., genetic algorithms) techniques by the union method is a better choice, providing relatively high classification accuracy and a reasonably good feature reduction rate.

中文翻译:

在医疗数据集中整合特征选择:组合过滤器,包装器和嵌入式特征选择结果

特征选择是旨在从给定的数据集中滤除代表性特征的过程,通常允许以后的数据挖掘和分析步骤产生更好的结果。但是,不同的特征选择算法使用不同的标准来选择代表性特征,从而很难为不同的域数据集找到最佳算法。单个特征选择方法的局限性可以通过结合多个特征选择结果的集成方法来克服。在文献中,特征选择算法被分类为过滤器,包装器或嵌入式技术。但是,据我们所知,还没有研究集中于将这三种类型的技术结合起来以产生整体特征选择。因此,此处的目的是回答以下问题:不同类型的特征选择算法的哪种组合可为不同类型的医学数据(包括分类,数字和混合数据类型)提供最佳性能。实验结果表明,通过联合方法将过滤器(即主成分分析)和包装器(即遗传算法)技术相结合是一种更好的选择,它提供了相对较高的分类精度和相当不错的特征缩减率。
更新日期:2020-04-03
down
wechat
bug