当前位置: X-MOL 学术Stat. Interface › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data
Statistics and Its Interface ( IF 0.3 ) Pub Date : 2015-01-01 , DOI: 10.4310/sii.2015.v8.n2.a2
Francesco C Stingo 1 , Michael D Swartz 2 , Marina Vannucci 3
Affiliation  

Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.

中文翻译:

在癌症数据的遗传分析中识别基因和基因水平 SNP 聚集体的贝叶斯方法

复杂的疾病,如癌症,是由多个单核苷酸多态性 (SNP) 组成的复杂病因引起的,每个单核苷酸多态性对疾病的整体风险都有少量贡献。因此,许多研究人员已经超越了单一 SNP 的分析方法,而是专注于 SNP 组,例如通过分析单倍型。最近,已经提出了基于通路的方法,该方法使用关于基因功能的先验生物学知识来实现​​对全基因组关联研究 (GWAS) 数据的更强大分析。在本文中,我们提出了一种新的贝叶斯建模框架来识别用于疾病预测的分子生物标志物。我们的方法将基于通路的方法与指定感兴趣区域的多个 SNP 分析相结合。该模型的开发受到来自肺癌研究的 SNP 数据的推动。在我们的方法中,我们根据 SNP 等位基因频率定义基因水平评分,并使用线性建模设置来研究评分与观察到的表型的关联。基因水平评分定义背后的基本思想是根据 Hardy-Weinberg 平衡定律下预期的基因型频率,根据其稀有性对基因内的 SNP 进行权衡。这导致分数更加重视异常低的频率,即可能表明属于不同组的受试者之间的特殊遗传差异的SNP。我们方法的另一个特点是我们将 SNP 到 SNP 关联的信息合并到模型中。特别是,我们使用网络先验来模拟 SNP 之间的连锁不平衡。对于后验推断,我们设计了一种随机搜索方法,可识别用于疾病预测的重要生物标志物(基因和 SNP)。我们评估模拟数据的性能并将结果与​​现有方法进行比较。然后我们展示了所提出的方法在肺癌数据集中检测相关基因和相关 SNP 的能力。
更新日期:2015-01-01
down
wechat
bug