LazyB: fast and cheap genome assembly,Algorithms for Molecular Biology

当前位置： X-MOL 学术 › Algorithms Mol. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LazyB: fast and cheap genome assembly
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2021-06-01 , DOI: 10.1186/s13015-021-00186-5
Thomas Gatter ₁ , Sarah von Löhneysen ₁ , Jörg Fallmann ₁ , Polina Drozdova ₂ , Tom Hartmann ₁ , Peter F Stadler _{1,

3,

4,

5,

6}

Affiliation

Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, “hybrid” methods that integrate short and long read data have been devised to address this need. LazyB is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. LazyB starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of LazyB, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. LazyB is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. The LazyB prototype is available at https://github.com/TGatter/LazyB .

中文翻译：

LazyB：快速且廉价的基因组组装

过去几年基因组测序的进步导致了该领域的根本范式转变。随着测序成本的稳步下降，基因组项目不再受到原始测序数据成本的限制，而是受到与基因组组装相关的计算问题的限制。迫切需要更有效和更准确的方法，特别是对于高度复杂且通常非常大的动植物基因组。最近，已经设计出集成短读取数据和长读取数据的“混合”方法来满足这一需求。 LazyB 就是这样一个混合基因组组装器。它经过专门设计，强调利用低覆盖率的短读和长读。 LazyB 从长读和限制性过滤的短读单元之间的二分重叠图开始。该图被转换为长读重叠图 G。LazyB 没有采用去除提示、气泡和其他局部特征的更传统方法，而是逐步提取其全局属性接近路径不相交并集的子图。首先，提取一致定向的子图，在第二步中将其简化为有向无环图。在下一步中，使用适当区间图的属性来提取重叠群作为最大权重路径。这些路径仅在最后一步被翻译成基因组序列。 LazyB 的原型实现完全用 Python 编写，与最先进的管道相比，不仅可以产生更准确的酵母和果蝇基因组组装，而且需要的计算量也少得多。 LazyB 是新型低成本基因组组装器，可以很好地应对大型基因组和低覆盖率。它基于一种将重叠图减少为路径集合的新颖方法，从而为未来的改进开辟了新的途径。 LazyB 原型可在 https://github.com/TGatter/LazyB 获取。

更新日期：2021-06-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11