当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detection of simple and complex de novo mutations with multiple reference sequences.
Genome Research ( IF 7 ) Pub Date : 2020-08-01 , DOI: 10.1101/gr.255505.119
Kiran V Garimella 1, 2, 3 , Zamin Iqbal 2, 4 , Michael A Krause 2, 5, 6 , Susana Campino 5 , Mihir Kekre 5 , Eleanor Drury 5 , Dominic Kwiatkowski 3, 5 , Juliana M Sá 6 , Thomas E Wellems 6 , Gil McVean 2, 3
Affiliation  

The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read–derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.

中文翻译:

使用多个参考序列检测简单和复杂的从头突变。

从全基因组测序数据中表征高序列和结构多样性区域的新生突变仍然极具挑战性。复杂的结构变异往往出现在高重复性和低复杂性的区域,对从头组装和映射方法都提出了挑战,其中短读不能捕获解析所需的远程上下文,其中读取与参考的不正确对齐与样本基因组高度不同的基因组可能导致错误或部分调用。长读长技术有可能解决此类问题,但目前无法大规模使用。在这里,我们介绍了 Cortical,这是一种基于图形的方法,它结合了多种技术和先前数据源的优势来检测任意类别的遗传变异。我们构造多样本,来自所有样本的短读数据的彩色 de Bruijn 图,比对长读衍生的单倍型和多个参考数据源以恢复图连接信息,并使用图寻路算法和同步比对和重组模型调用变体。我们使用广泛的模拟验证和评估该方法,并使用它来表征 119 个子代中新生突变事件的发生率和范围,这些事件来自四个恶性疟原虫实验性杂交,使用父母的长读数据为后代的重建提供信息,并检测几个已知和新的非等位基因同源重组事件。
更新日期:2020-08-27
down
wechat
bug