当前位置: X-MOL 学术medRxiv. Genet. Genom. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Disease association with frequented regions of genotype graphs
medRxiv - Genetic and Genomic Medicine Pub Date : 2020-09-27 , DOI: 10.1101/2020.09.25.20201640
Samuel Hokin , Alan Cleary , Joann Mudge

Complex diseases, with many associated genetic and environmental factors, are a challenging target for genomic risk assessment. Genome-wide association studies (GWAS) associate disease status with, and compute risk from, individual common variants, which can be problematic for diseases with many interacting or rare variants. In addition, GWAS typically employ a reference genome which is not built from the subjects of the study, whose genetic background may differ from the reference and whose genetic characterization may be limited. We present a complementary method based on disease association with collections of genotypes, called frequented regions, on a pangenomic graph built from subjects' genomes. We introduce the pangenomic genotype graph, which is better suited than sequence graphs to human disease studies. Our method draws out collections of features, across multiple genomic segments, which are associated with disease status. We show that the frequented regions method consistently improves machine-learning classification of disease status over GWAS classification, allowing incorporation of rare or interacting variants. Notably, genomic segments that have few or no variants of genome-wide significance (p<5x10-8) provide much-improved classification with frequented regions, encouraging their application across the entire genome. Frequented regions may also be utilized for purposes such as choice of treatment in addition to prediction of disease risk.

中文翻译:

疾病与基因型图频繁区域的关联

具有许多相关的遗传和环境因素的复杂疾病是基因组风险评估的挑战性目标。全基因组关联研究(GWAS)将疾病状况与单个常见变体相关联并计算其风险,这对于具有许多相互作用或罕见变体的疾病可能会造成问题。另外,GWAS通常使用参考基因​​组,该基因组不是由研究对象构建的,其遗传背景可能与参考文献不同,并且其遗传特征可能受到限制。我们提出了一种基于疾病的补充方法,该方法与从受试者基因组构建的全基因组图上的基因型集合(称为频繁区域)相关。我们介绍了全基因组基因型图,它比序列图更适合人类疾病研究。我们的方法在多个基因组片段中绘制了与疾病状态相关的特征集合。我们显示出,频繁区域方法始终优于GWAS分类,从而不断改善疾病状态的机器学习分类,从而允许合并稀有或相互作用的变体。值得注意的是,基因组区段几乎没有或没有全基因组意义的变异(p <5x10-8)提供了频繁使用区域的大大改进的分类方法,从而鼓励其在整个基因组中的应用。除预测疾病风险外,常见区域还可用于诸如治疗选择等目的。
更新日期:2020-09-28
down
wechat
bug