当前位置: X-MOL 学术PeerJ Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments
PeerJ Computer Science ( IF 3.5 ) Pub Date : 2021-06-01 , DOI: 10.7717/peerj-cs.562
Muhammad Hamraz 1 , Naz Gul 1 , Mushtaq Raza 2 , Dost Muhammad Khan 1 , Umair Khalil 1 , Seema Zubair 3 , Zardad Khan 1
Affiliation  

In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.

中文翻译:

功能基因组实验中二元分类特征选择的稳健比例重叠分析

在本文中,通过利用稳健的分散度量,即中值绝对偏差(MAD),提出了一种用于微阵列基因表达数据集的称为稳健比例重叠评分(RPOS)的新型特征选择方法。该方法通过考虑二元类问题的基因表达值的重叠分数,稳健地识别最具辨别力的基因。类间高度重叠的基因被丢弃,并选择区分类的基因。通过考虑 11 个基因表达数据集,将所提出的方法的结果与基于分类误差、Brier 分数和灵敏度的五种最先进的基因选择方法进行比较。通过所提出的方法对不同组选定基因的观察结果进行分类由三个不同的分类器进行,即随机森林、k-最近邻(k-NN)和支持向量机(SVM)。本文还显示了结果的箱线图和稳定性得分。结果表明,在大多数情况下,所提出的方法优于其他方法。
更新日期:2021-06-01
down
wechat
bug