当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Population-wide copy number variation calling using variant call format files from 6,898 individuals.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2019-09-14 , DOI: 10.1002/gepi.22260
Grace Png 1, 2, 3 , Daniel Suveges 1, 4 , Young-Chan Park 1, 2 , Klaudia Walter 1 , Kousik Kundu 1 , Ioanna Ntalla 5 , Emmanouil Tsafantakis 6 , Maria Karaleftheri 7 , George Dedoussis 8 , Eleftheria Zeggini 1, 3 , Arthur Gilly 1, 3, 9
Affiliation  

Copy number variants (CNVs) play an important role in a number of human diseases, but the accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. We use a regression tree-based approach to call germline CNVs from whole-genome sequencing (WGS, >18x) variant call sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. Eighty-one percent of detected events have been previously reported in the Database of Genomic Variants. Twenty-three percent of high-quality deletions affect entire genes, and we recapitulate known events such as the GSTM1 and RHD gene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe complex CNV patterns underlying an association with levels of the CCL3 protein (MAF = 0.15, p = 3.6x10-12 ) at the CCL3L3 locus, and a novel cis-association between a low-frequency NOMO1 deletion and NOMO1 protein levels (MAF = 0.02, p = 2.2x10-7 ). This study demonstrates that existing population-wide WGS call sets can be mined for germline CNVs with minimal computational overhead, delivering insight into a less well-studied, yet potentially impactful class of genetic variant.

中文翻译:

使用来自 6,898 个人的变体调用格式文件进行全种群拷贝数变异调用。

拷贝数变异 (CNV) 在许多人类疾病中发挥重要作用,但准确调用 CNV 仍然具有挑战性。当前大多数 CNV 检测方法使用原始读取对齐,这是计算密集型处理。我们使用基于回归树的方法从四个欧洲队列的 6,898 个样本中的全基因组测序 (WGS, >18x) 变异调用集中调用种系 CNV,并描述了包含 1,320 个 CNV 的丰富的大变异环境。81% 的检测到的事件先前已在基因组变异数据库中报告过。23% 的高质量缺失会影响整个基因,我们概括了已知事件,例如 GSTM1 和 RHD 基因缺失。我们测试检测到的缺失与 1 中的 275 个蛋白质水平之间的关联,457 个人评估检测到的 CNV 的潜在临床影响。我们描述了与 CCL3L3 基因座 CCL3 蛋白水平(MAF = 0.15,p = 3.6x10-12)相关的复杂 CNV 模式,以及低频 NOMO1 缺失和 NOMO1 蛋白水平(MAF)之间的新型顺式关联。 = 0.02,p = 2.2x10-7)。这项研究表明,现有的全人群 WGS 调用集可以以最小的计算开销挖掘种系 CNV,从而深入了解研究较少但可能具有影响的遗传变异类别。p = 2.2x10-7)。这项研究表明,现有的全人群 WGS 调用集可以以最小的计算开销挖掘种系 CNV,从而深入了解研究较少但可能具有影响的遗传变异类别。p = 2.2x10-7)。这项研究表明,现有的全人群 WGS 调用集可以以最小的计算开销挖掘种系 CNV,从而深入了解研究较少但可能具有影响的遗传变异类别。
更新日期:2019-11-01
down
wechat
bug