当前位置: X-MOL 学术Genes Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting biomarkers from microarray data using distributed correlation based gene selection.
Genes & Genomics ( IF 2.1 ) Pub Date : 2020-02-10 , DOI: 10.1007/s13258-020-00916-w
Alok Kumar Shukla 1 , Diwakar Tripathi 2
Affiliation  

BACKGROUND Over the past few decades, DNA microarray technology has emerged as a prevailing process for early identification of cancer subtypes. Several feature selection (FS) techniques have been widely applied for identifying cancer from microarray gene data but only very few studies have been conducted on distributing the feature selection process for detecting cancer subtypes. OBJECTIVE Not all the gene expressions are needed in prediction, this research article objective is to select discriminative biomarkers by using distributed FS method which helps in accurately diagnosis of cancer subtype. Traditional feature selection techniques have several drawbacks like unrelated features that could perform well in terms of classification accuracy with a suitable subset of genes will be left out of the selection. METHOD To overcome the issue, in this paper a new filter-based method for gene selection is introduced which can select the highly relevant genes for distinguishing tissues from the gene expression dataset. In addition, it is used to compute the relation between gene-gene and gene-class and simultaneously identify subset of essential genes. Our method is tested on Diffuse Large B cell Lymphoma (DLBCL) dataset by using well-known classification techniques such as support vector machine, naïve Bayes, k-nearest neighbor, and decision tree. RESULTS Results on biological DLBCL dataset demonstrate that the proposed method provides promising tools for the prediction of cancer type, with the prediction accuracy of 97.62%, precision of 94.23%, sensitivity of 94.12%, F-measure of 90.12%, and ROC value of 99.75%. CONCLUSION The experimental results reveal the fact that the proposed method is significantly improved classification accuracy and execution time, compared to existing standard algorithms when applied to the non-partitioned dataset. Furthermore, the extracted genes are biologically sound and agree with the outcome of relevant biomedical studies.

中文翻译:

使用基于分布相关性的基因选择从微阵列数据中检测生物标记。

背景技术在过去的几十年中,DNA微阵列技术已经成为用于早期识别癌症亚型的流行方法。几种特征选择(FS)技术已被广泛地用于从微阵列基因数据中识别癌症,但是关于分布特征选择过程以检测癌症亚型的研究很少。目的并不是所有的基因表达都需要预测,本研究的目的是通过使用分布式FS方法选择可鉴别的生物标志物,以帮助准确诊断癌症亚型。传统的特征选择技术有几个缺点,例如无关的特征可能会在分类准确性方面表现良好,而适当的基因子集将被排除在选择之外。方法为了解决这个问题,本文介绍了一种基于过滤器的新基因选择方法,该方法可以从基因表达数据集中选择高度相关的基因来区分组织。此外,它还用于计算基因-基因和基因类别之间的关系,并同时识别必需基因的子集。通过使用支持向量机,朴素贝叶斯,k近邻和决策树等众所周知的分类技术,我们的方法在弥漫性大B细胞淋巴瘤(DLBCL)数据集上进行了测试。结果生物学DLBCL数据集上的结果表明,该方法为癌症类型的预测提供了有希望的工具,预测准确性为97.62%,准确性为94.23%,灵敏度为94.12%,F-measure为90.12%,ROC值为99.75%。结论实验结果表明,与应用于非分区数据集的现有标准算法相比,该方法显着提高了分类准确性和执行时间。此外,提取的基因在生物学上是合理的,并且与相关生物医学研究的结果一致。
更新日期:2020-02-10
down
wechat
bug