当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Origins and characterization of variants shared between databases of somatic and germline human mutations.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-06-04 , DOI: 10.1186/s12859-020-3508-8
William Meyerson 1, 2 , John Leisman 3 , Fabio C P Navarro 1, 4 , Mark Gerstein 1, 2, 4, 5
Affiliation  

Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap. After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones. Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation.

中文翻译:

体细胞和生殖系人类突变数据库之间共享变体的起源和特征。

突变出现在人类基因组中的两种主要环境中:种系和体细胞。这些设置涉及不同的遗传模式、时间尺度、染色质结构和环境暴露,所有这些都会影响替换的结果分布。尽管如此,许多相同的单核苷酸变异 (SNV) 在种系和体细胞突变数据库之间共享,例如在 120,000 个种系外显子组的 gnomAD 数据库和 10,000 个体细胞外显子组的 TCGA 数据库之间共享。在这里,我们试图解释这种重叠。在严格过滤以排除常见的种系多态性和覆盖率或可映射性差的位点后,我们发现体细胞和种系数据库之间共有 336,987 个变异。统一的统计模型解释了这些共享变体中的 34%;一个包含基本突变类型不同突变率的模型解释了另外 50% 的共享变体;一个包含扩展核苷酸上下文(例如,两边的 3 个碱基)的模型解释了另外 4% 的共享变体。读取深度分析发现混合证据表明,多达 4% 的共享变异可能代表泄漏到体细胞调用集中的生殖系变异。9% 的共享变体没有被任何模型解释。测序错误和趋同进化没有解释这些。我们还调查了其他因素:由内源性突变过程驱动的癌症与种系共享更大比例的变异,最近衍生的种系变异比古代种系变异更有可能在体细胞上共享。总体,我们发现共享变异在很大程度上代表了相同变异在生殖系和体细胞环境中的真实生物学发生,并且主要是因为 DNA 在这两种环境中具有一些相同的基本化学脆弱性。此外,我们发现混合的证据表明体细胞调用集泄漏了可观数量的生殖系变异,这与基因组隐私法规有关。在未来的研究中,体细胞和种系设置之间 DNA 的类似化学脆弱性可用于通过指导由体细胞和种系变异模式提供信息的背景突变模型的开发来帮助识别疾病相关基因。此外,我们发现混合的证据表明体细胞调用集泄漏了可观数量的生殖系变异,这与基因组隐私法规有关。在未来的研究中,体细胞和种系设置之间 DNA 的类似化学脆弱性可用于通过指导由体细胞和种系变异模式提供信息的背景突变模型的开发来帮助识别疾病相关基因。此外,我们发现混合的证据表明体细胞调用集泄漏了可观数量的生殖系变异,这与基因组隐私法规有关。在未来的研究中,体细胞和种系设置之间 DNA 的类似化学脆弱性可用于通过指导由体细胞和种系变异模式提供信息的背景突变模型的开发来帮助识别疾病相关基因。
更新日期:2020-06-04
down
wechat
bug