当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Utilizing Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2018-09-03 , DOI: 10.1109/tcbb.2018.2868667
Paul Fergus , Aday Montanez , Basma Abdulaimma , Paulo Lisboa , Carl Chalmers , Beth Pineles

Genome-Wide Association Studies (GWAS) are used to identify statistically significant genetic variants in case-control studies. The main objective is to find single nucleotide polymorphisms (SNPs) that influence a particular phenotype. GWAS use a p-value threshold of $5\star 10^{-8}$ to identify highly ranked SNPs. While this approach has proven useful for detecting disease-susceptible SNPs, evidence has shown that many of these are, in fact, false positives. Consequently, there is some ambiguity about the most suitable threshold for claiming genome-wide significance. Many believe that using lower p-values will allow us to investigate the joint epistatic interactions between SNPs and provide better insights into phenotype expression. In this paper, we propose a novel framework, based on nonlinear transformations of combinatorically large SNP data, using stacked autoencoders, to identify higher-order SNP interactions. We focus on the challenging problem of classifying preterm births. Latent representations from original SNP sequences are used to initialize a deep learning classifier before it is fine-tuned for classification tasks. The findings show that important information pertaining to epistasis can be extracted from 4666 raw SNPs generated using logistic regression (p-value=$5\star 10^{-8}$) and used to fit a deep learning model and obtain results (Sen=0.9289, Spec=0.9591, Gini=0.9651, Logloss=0.3080, AUC=0.9825, MSE=0.0942) using 500 hidden nodes.

中文翻译:

利用深度学习和全基因组关联研究对非裔美国人妇女进行上位性驱动的早产分类。

全基因组关联研究(GWAS)用于确定病例对照研究中具有统计学意义的遗传变异。主要目的是发现影响特定表型的单核苷酸多态性(SNP)。GWAS使用$ 5 \ star 10 ^ {-8} $的p值阈值来确定排名较高的SNP。尽管已证明该方法可用于检测易感疾病的SNP,但证据表明,其中许多实际上是假阳性。因此,对于声称全基因组意义的最合适阈值存在一些歧义。许多人认为,使用较低的p值将使我们能够研究SNP之间的联合上位相互作用,并提供对表型表达的更好见解。在本文中,我们提出了一个新颖的框架,该框架基于组合型SNP数据的非线性转换,使用堆叠式自动编码器来识别高阶SNP交互。我们专注于对早产进行分类的具有挑战性的问题。来自原始SNP序列的潜在表示用于初始化深度学习分类器,然后对其进行精细调整以进行分类任务。研究结果表明,可以从使用logistic回归生成的4666个原始SNP中提取与上位有关的重要信息(p值= $ 5 \ star 10 ^ {-8} $),并将其用于深度学习模型并获得结果(Sen = 0.9289,Spec = 0.9591,Gini = 0.9651,Logloss = 0.3080,AUC = 0.9825,MSE = 0.0942)使用500个隐藏节点。来自原始SNP序列的潜在表示用于初始化深度学习分类器,然后对其进行精细调整以进行分类任务。研究结果表明,可以从使用logistic回归生成的4666个原始SNP中提取与上位有关的重要信息(p值= $ 5 \ star 10 ^ {-8} $),并将其用于深度学习模型并获得结果(Sen = 0.9289,Spec = 0.9591,Gini = 0.9651,Logloss = 0.3080,AUC = 0.9825,MSE = 0.0942)使用500个隐藏节点。来自原始SNP序列的潜在表示用于初始化深度学习分类器,然后对其进行精细调整以进行分类任务。研究结果表明,可以从使用logistic回归生成的4666个原始SNP中提取与上位有关的重要信息(p值= $ 5 \ star 10 ^ {-8} $),并将其用于深度学习模型并获得结果(Sen = 0.9289,Spec = 0.9591,Gini = 0.9651,Logloss = 0.3080,AUC = 0.9825,MSE = 0.0942)使用500个隐藏节点。
更新日期:2020-04-22
down
wechat
bug