当前位置: X-MOL 学术Biocybern. Biomed. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Phase diagram and ridge logistic regression in stable gene selection
Biocybernetics and Biomedical Engineering ( IF 5.3 ) Pub Date : 2020-05-06 , DOI: 10.1016/j.bbe.2020.04.003
Elahe Khani , Hamid Mahmoodian

Microarray analysis is widely used for cancer diagnosis and classification. However, among a large amount of genes in microarray data, only a small fraction of them is effective for making a highly reliable model. There are two major challenges in this regard: Thus, one of the challenging tasks is how to identify significant genes from thousands of them in datasets that can improve the generated model and the other one is how to select the subset of genes with minimum dependency to the samples in datasets which is termed as stability of selected sets. Different approaches have been presented in previous works. In this study, we propose a new algorithm for gene selection based on the phase diagram method which has been proposed earlier. Ridge logistic regression has been used to estimate the probability of genes that are most likely to belong to a set of stable genes with high classification capability. In order to consider the stability issue, a method is proposed for the final selection of selected sets. The B632+ error estimation method has been applied to evaluate the performance of the model. The proposed method was applied to four cancer datasets and obtained results are compared with other validation methods and the results show that the selected genes have superiority in terms of the number of genes, degree of stability and classification accuracy.



中文翻译:

稳定基因选择中的相图和岭对数回归

芯片分析广泛用于癌症的诊断和分类。但是,在微阵列数据中的大量基因中,只有一小部分对建立高度可靠的模型有效。在这方面存在两个主要挑战:因此,一项具有挑战性的任务是如何从可以改善生成模型的数据集中的数千个重要基因中识别出重要基因,而另一个挑战是如何选择对基因的依赖性最小的基因子集。数据集中的样本,称为选定集合的稳定性。在以前的工作中已经提出了不同的方法。在这项研究中,我们基于较早提出的相图方法提出了一种新的基因选择算法。Ridge logistic回归已用于估计最有可能属于具有高分类能力的一组稳定基因的基因的概率。为了考虑稳定性问题,提出了一种最终选择所选集合的方法。B632 +误差估计方法已应用于评估模型的性能。将该方法应用于四个癌症数据集,并将所得结果与其他验证方法进行比较,结果表明所选基因在基因数量,稳定性和分类准确性方面均具有优势。B632 +误差估计方法已应用于评估模型的性能。将该方法应用于四个癌症数据集,并将所得结果与其他验证方法进行比较,结果表明所选基因在基因数量,稳定性和分类准确性方面均具有优势。B632 +误差估计方法已应用于评估模型的性能。将该方法应用于四个癌症数据集,并将所得结果与其他验证方法进行比较,结果表明所选基因在基因数量,稳定性和分类准确性方面均具有优势。

更新日期:2020-05-06
down
wechat
bug