当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2019-09-14 , DOI: 10.1002/gepi.22261
Zhongsheng Chen 1 , Michael Boehnke 1 , Christian Fuchsberger 1, 2, 3
Affiliation  

Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single-study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep-coverage (~82×) exome and low-coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts. For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta-analysis has similar power to joint analysis in deep-coverage sequence data but can be less powerful in low-coverage sequence data. Given similar data processing and quality control steps, we recommend single-study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep-coverage data.

中文翻译:


结合多项研究的序列数据:分析策略对罕见变异调用和关联结果的影响。



个体测序研究的样本量通常有限,因此检测与罕见变异的性状关联的能力也有限。一个常见的策略是汇总多项研究的数据。对于研究罕见变异,联合调用所有样本是黄金标准策略,但由于隐私限制和计算负担,可能难以实施。在这里,我们将联合调用与单一研究调用的替代方案在变异检测灵敏度和基因型准确性(作为测序覆盖度的函数)方面进行比较,并评估它们对下游关联分析的影响。为此,我们对五个地理队列中 2,250 名来自 2 型糖尿病遗传学研究的个体的深度覆盖 (~82×) 外显子组和低覆盖 (~5×) 基因组序列数据进行了联合和单独分析。对于罕见的单核苷酸变异 (SNV): (a) ≥97% 的已发现 SNV 是通过两种调用策略找到的; (b) 两种调用策略与一组高度准确的基因型的非参考一致性≥99%; (c) 荟萃分析在深度覆盖序列数据中具有与联合分析相似的能力,但在低覆盖序列数据中可能不那么强大。鉴于类似的数据处理和质量控制步骤,我们建议将单一研究调用作为联合调用的可行替代方案,用于分析深度覆盖数据中所有次要等位基因频率的 SNV。
更新日期:2019-11-01
down
wechat
bug