当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
Genome Research ( IF 6.2 ) Pub Date : 2023-07-01 , DOI: 10.1101/gr.277615.122
Andrea Cracco 1 , Alexandru I Tomescu 2
Affiliation  

Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging the k-mer counting step with the unitig construction step, as well as on numerous practical optimizations. For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3× to 21× compared with the state-of-the-art tool Cuttlefish 2. When constructing the colored variant, GGCAT achieves speed-ups of 5× to 39× compared with the state-of-the-art tool BiFrost. Additionally, GGCAT is up to 480× faster than BiFrost for batch sequence queries on colored graphs.

中文翻译:

使用 GGCAT 极其快速地构建和查询压缩和彩色 de Bruijn 图

压缩 de Bruijn 图是计算基因组学中最基本的数据结构之一。彩色压缩 de Bruijn 图是建立在序列集合上的变体,并将每个k聚体与其出现的序列相关联。我们提出了 GGCAT,一种用于构建两种类型图的工具,基于将k聚体计数步骤与 unitig 构建步骤合并的新方法以及大量实际优化。对于紧凑的 de Bruijn 图构建,与最先进的工具 Cuttlefish 2 相比,GGCAT 实现了 3 倍到 21 倍的加速。在构建彩色变体时,GGCAT 与最先进的工具 Cuttlefish 2 相比,实现了 5 倍到 39 倍的加速使用最先进的工具 BiFrost。此外,对于彩色图上的批量序列查询,GGCAT 比 BiFrost 快 480 倍。
更新日期:2023-07-01
down
wechat
bug