当前位置: X-MOL 学术Comput. Math. Method Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner
Computational and Mathematical Methods in Medicine ( IF 2.809 ) Pub Date : 2021-05-18 , DOI: 10.1155/2021/6662420
S Murugesan 1 , R S Bhuvaneswaran 1 , H Khanna Nehemiah 1 , S Keerthana Sankari 2 , Y Nancy Jane 3
Affiliation  

A computer-aided diagnosis (CAD) system that employs a super learner to diagnose the presence or absence of a disease has been developed. Each clinical dataset is preprocessed and split into training set (60%) and testing set (40%). A wrapper approach that uses three bioinspired algorithms, namely, cat swarm optimization (CSO), krill herd (KH) ,and bacterial foraging optimization (BFO) with the classification accuracy of support vector machine (SVM) as the fitness function has been used for feature selection. The selected features of each bioinspired algorithm are stored in three separate databases. The features selected by each bioinspired algorithm are used to train three back propagation neural networks (BPNN) independently using the conjugate gradient algorithm (CGA). Classifier testing is performed by using the testing set on each trained classifier, and the diagnostic results obtained are used to evaluate the performance of each classifier. The classification results obtained for each instance of the testing set of the three classifiers and the class label associated with each instance of the testing set will be the candidate instances for training and testing the super learner. The training set comprises of 80% of the instances, and the testing set comprises of 20% of the instances. Experimentation has been carried out using seven clinical datasets from the University of California Irvine (UCI) machine learning repository. The super learner has achieved a classification accuracy of 96.83% for Wisconsin diagnostic breast cancer dataset (WDBC), 86.36% for Statlog heart disease dataset (SHD), 94.74% for hepatocellular carcinoma dataset (HCC), 90.48% for hepatitis dataset (HD), 81.82% for vertebral column dataset (VCD), 84% for Cleveland heart disease dataset (CHD), and 70% for Indian liver patient dataset (ILP).

中文翻译:

使用仿生算法和超级学习器对临床数据集进行特征选择和分类

已经开发了一种计算机辅助诊断 (CAD) 系统,该系统采用超级学习器来诊断疾病的存在或不存在。每个临床数据集都经过预处理并分为训练集 (60%) 和测试集 (40%)。使用三种仿生算法,即猫群优化 (CSO)、磷虾群 (KH) 和细菌觅食优化 (BFO) 以支持向量机 (SVM) 的分类精度作为适应度函数的包装方法已被用于特征选择。每个仿生算法的选定特征存储在三个独立的数据库中。每个仿生算法选择的特征用于使用共轭梯度算法(CGA)独立训练三个反向传播神经网络(BPNN)。分类器测试是通过在每个训练好的分类器上使用测试集进行的,得到的诊断结果用于评估每个分类器的性能。三个分类器的测试集每个实例得到的分类结果和测试集每个实例关联的类标签将作为训练和测试超级学习器的候选实例。训练集包含 80% 的实例,测试集包含 20% 的实例。使用来自加州大学欧文分校 (UCI) 机器学习库的七个临床数据集进行了实验。超级学习器在威斯康星州诊断性乳腺癌数据集 (WDBC) 上实现了 96.83% 的分类准确率,在 Statlog 心脏病数据集 (SHD) 上实现了 86.36% 的分类准确率,94。
更新日期:2021-05-18
down
wechat
bug