当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A safe and complete algorithm for metagenomic assembly.
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2018-02-07 , DOI: 10.1186/s13015-018-0122-7
Nidia Obscura Acosta 1 , Veli Mäkinen 1 , Alexandru I Tomescu 1
Affiliation  

BACKGROUND Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. APPROACH We address this problem with the "safe and complete" framework of Tomescu and Medvedev (Research in computational Molecular biology-20th annual conference, RECOMB 9649:152-163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. RESULTS We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time [Formula: see text], and in the edge-covering case it runs in time [Formula: see text]; n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.

中文翻译:

一种安全完整的宏基因组组装算法。

背景从短片段重建物种的基因组是最古老的生物信息学问题之一。宏基因组组装是要求重建测序样本中存在的所有细菌物种的环状基因组的问题的一个变体。这个问题可以很自然地表述为找到一个有向图 G 的循环游走的集合,它们一起覆盖 G 的所有节点或边。计算分子生物学-第 20 届年会,RECOMB 9649:152-163,2016)。如果一个算法仅返回在 G 的所有宏基因组装配解决方案中作为子步行出现的那些步行(也称为安全),则该算法称为安全算法。如果它返回 G 的所有安全步行,则称为完整算法。结果 我们给出了 G 的安全游走的图论表征,以及找到 G 的所有安全游走的安全且完整的算法。在节点覆盖的情况下,我们的算法在时间 [公式:见文本] 和边缘运行- 涵盖及时运行的情况[公式:见正文];n 和 m 分别表示 G 的节点数和边数。该算法构成了使用该问题公式从宏基因组读取中可以安全地组装什么的第一个理论上的严格上限。
更新日期:2019-11-01
down
wechat
bug