当前位置: X-MOL 学术Genom. Proteom. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
Genomics, Proteomics & Bioinformatics ( IF 9.5 ) Pub Date : 2021-07-03 , DOI: 10.1016/j.gpb.2021.03.007
Jiadong Lin 1 , Xiaofei Yang 2 , Walter Kosters 3 , Tun Xu 4 , Yanyan Jia 4 , Songbo Wang 4 , Qihui Zhu 5 , Mallory Ryan 5 , Li Guo 6 , Chengsheng Zhang 7 , , Charles Lee 7 , Scott E Devine 8 , Evan E Eichler 9 , Kai Ye 10
Affiliation  

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.



中文翻译:

Mako:一种基于图形的模式增长方法来检测复杂的结构变异

复杂结构变异(CSV) 是具有两个以上断点的基因组改变,被认为是简单结构变异的同时发生。然而,通过常用的模型匹配策略检测 CSV 的复合突变信号具有挑战性。因此,与简单的结构变体相比,CSV 发现的进展有限。在这里,我们系统地分析了 CSV 的多断点连接特征,并提出 Mako,利用自下而上的引导无模型策略,从双端短读长测序中检测 CSV。具体来说,我们实现了基于图的模式增长方法,其中图表描述了潜在的断点连接,模式增长使 CSV 检测无需预定义模型。对模拟和真实数据集的综合评估表明,Mako 优于其他算法。值得注意的是,基于实验和计算验证以及人工检查的真实数据的 CSV 验证率约为 70%,其中实验和计算断点偏移的中位数分别为 13 bp 和 26 bp。此外,Mako CSV 子图有效地表征了 CSV 事件的断点连接,并发现了总共 15 种 CSV 类型,包括相邻段交换和串联分散重复两种新型类型。对这些 CSV 的进一步分析还揭示了序列同源性对 CSV 形成的影响。

更新日期:2021-07-03
down
wechat
bug