当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results.
Genetic Epidemiology ( IF 2.1 ) Pub Date : 2019-09-14 , DOI: 10.1002/gepi.22261
Zhongsheng Chen 1 , Michael Boehnke 1 , Christian Fuchsberger 1, 2, 3
Affiliation  

Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single-study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep-coverage (~82×) exome and low-coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts. For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta-analysis has similar power to joint analysis in deep-coverage sequence data but can be less powerful in low-coverage sequence data. Given similar data processing and quality control steps, we recommend single-study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep-coverage data.

中文翻译:

合并来自多个研究的序列数据:分析策略对稀有变异调用和关联结果的影响。

个体测序研究通常具有有限的样本量,因此检测具有稀有变异的性状关联的能力也有限。一种常见的策略是汇总来自多个研究的数据。对于研究稀有变体,将所有样本共同召集是黄金标准策略,但由于隐私限制和计算负担而可能难以实施。在这里,我们将变异检测灵敏度和基因型准确性作为测序覆盖率的函数,将联合调用与单项研究的替代方法进行比较,并评估其对下游关联分析的影响。为此,我们在五个地理队列中联合或分别分析了来自2型糖尿病遗传学研究的2250名个体的深覆盖(〜82x)外显子组和低覆盖(〜5x)基因组序列数据。对于稀有的单核苷酸变体(SNV):(a)两种调用策略均发现≥97%的发现SNV;(b)两种呼叫策略与一组高度准确的基因型的非参照一致性均≥99%;(c)元分析在深度覆盖序列数据中具有与联合分析相似的功能,但在低覆盖序列数据中的功能可能较弱。考虑到类似的数据处理和质量控制步骤,我们建议使用单人研究呼叫作为联合呼叫的可行选择,以分析深覆盖数据中所有次要等位基因频率的SNV。(c)元分析在深度覆盖序列数据中具有与联合分析相似的功能,但在低覆盖序列数据中的功能可能较弱。考虑到类似的数据处理和质量控制步骤,我们建议使用单人研究呼叫作为联合呼叫的可行选择,以分析深覆盖数据中所有次要等位基因频率的SNV。(c)元分析在深度覆盖序列数据中具有与联合分析相似的功能,但在低覆盖序列数据中的功能可能较弱。考虑到类似的数据处理和质量控制步骤,我们建议使用单人研究呼叫作为联合呼叫的可行选择,以分析深覆盖数据中所有次要等位基因频率的SNV。
更新日期:2019-11-01
down
wechat
bug