当前位置: X-MOL 学术Hum. Mol. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A data harmonization pipeline to leverage external controls and boost power in GWAS
Human Molecular Genetics ( IF 3.1 ) Pub Date : 2021-09-08 , DOI: 10.1093/hmg/ddab261
Danfeng Chen 1 , Katherine Tashman 2, 3 , Duncan S Palmer 2, 3 , Benjamin Neale 2, 3, 4 , Kathryn Roeder 5 , Alex Bloemendal 2, 3 , Claire Churchhouse 2, 3, 4 , Zheng Tracy Ke 6
Affiliation  

The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27 517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn’s disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.

中文翻译:

利用外部控制和增强 GWAS 能力的数据协调管道

在全基因组关联研究 (GWAS) 中使用外部对照可以显着增加对照样本的大小和多样性,从而实现高分辨率的祖先匹配并增强检测关联信号的能力。然而,由于批次效应、难以识别基因分型错误以及使用不同的基因分型平台,来自多个来源的控制聚合具有挑战性。这些障碍阻碍了 GWAS 中外部控制的使用,如果不仔细解决,可能会导致虚假结果。我们提出了一个统一的数据协调管道,其中包括质量控制和归因的迭代方法,在合并群组和阵列之前和之后实施。我们应用此协调管道从 dbGaP 中的 16 个集合中聚合 27 517 个欧洲对照样本。我们利用这些统一的控制措施对克罗恩病进行 GWAS。我们展示了比单独使用队列样本更强大的效果,并且我们的程序产生的汇总统计数据没有任何显着的批次效应。这种用于聚合来自多个来源的基因型数据的协调管道也可以服务于其他需要个体水平基因型而不是汇总统计数据的应用程序。
更新日期:2021-09-08
down
wechat
bug