当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A framework for pathway knowledge driven prioritization in genome-wide association studies.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2020-08-10 , DOI: 10.1002/gepi.22345
Shrayashi Biswas 1 , Soumen Pal 1 , Partha P Majumder 1 , Samsiddhi Bhattacharjee 1
Affiliation  

Many variants with low frequencies or with low to modest effects likely remain unidentified in genome‐wide association studies (GWAS) because of stringent genome‐wide thresholds for detection. To improve the power of detection, variant prioritization based on their functional annotations and epigenetic landmarks has been used successfully. Here, we propose a novel method of prioritization of a GWAS by exploiting gene‐level knowledge (e.g., annotations to pathways and ontologies) and show that it further improves power. Often, disease associated variants are found near genes that are coinvolved in specific biological pathways relevant to disease process. Utilization of this knowledge to conduct a prioritized scan increases the power to detect loci that map to genes clustered in a few specific pathways. We have developed a computationally scalable framework based on penalized logistic regression (termed GKnowMTestGenomic Knowledge‐guided Multiplte Testing) to enable a prioritized pathway‐guided GWAS scan with a very large number of gene‐level annotations. We demonstrate that the proposed strategy improves overall power and maintains the Type 1 error globally. Our method works on genome‐wide summary level data and a user‐specified list of pathways (e.g., those extracted from large pathway databases without reference to biology of a specific disease). It automatically reweights the input p values by incorporating the pathway enrichments as “adaptively learned” from the data using a cross‐validation technique to avoid overfitting. We used whole‐genome simulations and some publicly available GWAS data sets to illustrate the application of our method. The GKnowMTest framework has been implemented as a user‐friendly open‐source R package.

中文翻译:

全基因组关联研究中通路知识驱动优先级的框架。

由于严格的全基因组检测阈值,许多低频率或低至中等影响的变异可能在全基因组关联研究 (GWAS) 中仍未被识别。为了提高检测能力,已成功使用基于其功能注释和表观遗传标志的变异优先级。在这里,我们提出了一种通过利用基因级知识(例如,对路径和本体的注释)来确定 GWAS 优先级的新方法,并表明它进一步提高了能力。通常,在与疾病过程相关的特定生物学途径共同参与的基因附近发现疾病相关变异。利用这些知识进行优先扫描增加了检测映射到聚集在几个特定途径中的基因的基因座的能力。GKnowMTest基因知识引导的多重测试,以启用具有大量基因水平注释的优先路径引导的 GWAS 扫描。我们证明了所提出的策略提高了整体功率并在全局范围内保持了类型 1 错误。我们的方法适用于全基因组汇总水平数据和用户指定的路径列表(例如,从大型路径数据库中提取的路径,而不参考特定疾病的生物学)。它会自动重新加权输入p通过使用交叉验证技术将路径富集作为“自适应学习”从数据中结合起来,以避免过度拟合。我们使用全基因组模拟和一些公开可用的 GWAS 数据集来说明我们方法的应用。GKnowMTest框架已作为用户友好的开源 R 包实现。
更新日期:2020-08-10
down
wechat
bug