Artificial Intelligence in Medicine ( IF 7.5 ) Pub Date : 2021-03-06 , DOI: 10.1016/j.artmed.2021.102049 Rubul Kumar Bania 1 , Anindya Halder 1
Feature selection is one of the trustworthy processes of dimensionality reduction technique to select a subset of relevant and non-redundant features from large datasets. Ensemble feature selection (EFS) approach is a recent technique aiming at accumulating diversity in the subset of selected features. It improves the performance of learning algorithms and obtains more stable and robust results. In this paper, a novel rough set theory (RST) based heterogeneous EFS method (R-HEFS) is proposed for selecting the less redundant and highly relevant features during the aggregation of diverse feature subsets by applying the feature-class, feature-feature rough dependency and feature-significance measures. In R-HEFS five state-of-the-art RST based filter methods are used as a base feature selectors. Experiments are carried out on 10 benchmark medical datasets collected from the UCI repository. For the imputation of the missing values and discretization of the continuous features, nearest neighbor (NN) imputation method and RST based discretization techniques are applied. The effectiveness of the proposed R-HEFS method is evaluated and analyzed by using four benchmark classifiers viz., Naïve Bayes (NB), random forest (RF), support vector machine (SVM), and AdaBoost. The proposed R-HEFS method turns out to be effective by removing the non-relevant and redundant features during the process of aggregation of base feature selectors and it assists to increase the classification accuracy. Out of 10 different medical datasets, on 7 datasets, R-HEFS has achieved better average classification accuracy. So, the overall results strongly suggest that the proposed R-HEFS method can reduce the dimension of large medical datasets and may help the physicians or medical experts to diagnose (classify) different diseases with lesser computational complexities.
中文翻译:
R-HEFS:用于医学数据分类的基于粗糙集的异构集成特征选择方法
特征选择是从大型数据集中选择相关和非冗余特征的子集的降维技术的可信赖过程之一。集成特征选择 (EFS) 方法是最近的一项技术,旨在积累所选特征子集中的多样性。它提高了学习算法的性能并获得更稳定和鲁棒的结果。在本文中,提出了一种新的基于粗糙集理论 (RST) 的异构 EFS 方法 (R-HEFS),通过应用特征类、特征-特征粗糙依赖性和特征显着性度量。在 R-HEFS 中,五种最先进的基于 RST 的过滤方法被用作基本特征选择器。实验是在从 UCI 存储库收集的 10 个基准医学数据集上进行的。对于缺失值的插补和连续特征的离散化, 最近邻 (NN) 插补方法和基于 RST 的离散化技术被应用。通过使用四个基准分类器,即朴素贝叶斯 (NB)、随机森林 (RF)、支持向量机 (SVM) 和 AdaBoost,对所提出的 R-HEFS 方法的有效性进行了评估和分析。所提出的 R-HEFS 方法通过在基本特征选择器的聚合过程中去除不相关和冗余的特征来证明是有效的,并且有助于提高分类精度。在 10 个不同的医学数据集中,在 7 个数据集上,R-HEFS 取得了更好的平均分类准确率。因此,总体结果强烈表明,所提出的 R-HEFS 方法可以减少大型医学数据集的维度,并可以帮助医生或医学专家以较低的计算复杂度诊断(分类)不同的疾病。