当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2021-05-25 , DOI: 10.1186/s13015-021-00182-9
Kingshuk Mukherjee , Massimiliano Rossi , Leena Salmela , Christina Boucher

Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper .

中文翻译:

使用Bi-labeled de Bruijn图进行快速有效的Rmap组装

全基因组光学图是高分辨率限制性图谱,可为基因组提供独特的数字表示。它们是通过组装成千上万个称为Rmap的单分子光学图生成的。不幸的是,组装Rmap数据的选择很少。只有一种公开可用的非专有方法进行组装,而一种专有软件则可以通过可执行文件获得。此外,Valouev等人公开可用的方法。(Proc Natl Acad Sci USA 103(43):15770-15775,2006),遵循了重叠布局共识(OLC)范式,因此无法扩展到相对较大的基因组。专有方法背后的算法Bionano Genomics的Solve在很大程度上是未知的。在本文中,我们将成对的de Bruijn图中双标签的定义扩展到光学映射数据的上下文,并提出了第一个基于de Bruijn图的Rmap组装方法。我们实现了被称为rmapper的方法,并将其性能与Valouev等人的汇编程序进行了比较。(Proc Natl Acad Sci USA 103(43):15770-15775,2006)和Bionano Genomics解决了来自三个基因组的数据:大肠杆菌,人类和鲈鱼(Anabas Testudineus)。我们的方法能够在所有三个基因组上成功运行。Valouev等人的方法。(Proc Natl Acad Sci USA 103(43):15770-15775,2006)仅在大肠杆菌上成功运行。此外,在人类基因组上,敲击者的速度至少比Bionano Solve快130倍,使用的内存减少了五倍,并且产生了零组装错误的最高基因组分数。我们的软件
更新日期:2021-05-25
down
wechat
bug