当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A graph extension of the positional Burrows-Wheeler transform and its applications.
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2017-07-11 , DOI: 10.1186/s13015-017-0109-9
Adam M Novak 1 , Erik Garrison 2 , Benedict Paten 1
Affiliation  

We present a generalization of the positional Burrows-Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a compressible representation of a set of these graph-encoded haplotypes that allows for efficient subhaplotype match queries. We give efficient algorithms for gPBWT construction and query operations. As a demonstration, we use the gPBWT to quickly count the number of haplotypes consistent with random walks in a genome graph, and with the paths taken by mapped reads; results suggest that haplotype consistency information can be practically incorporated into graph-based read mappers. We estimate that with the gPBWT of the order of 100,000 diploid genomes, including all forms structural variation, could be stored and made searchable for haplotype queries using a single large compute node.

中文翻译:


位置 Burrows-Wheeler 变换的图扩展及其应用。



我们将位置 Burrows-Wheeler 变换(PBWT)推广到基因组图,我们称之为 gPBWT。基因组图是描述为图的一组基因组的折叠表示。在基因组图中,单倍型对应于步行的受限形式。 gPBWT 是一组这些图形编码单倍型的可压缩表示,可实现高效的子单倍型匹配查询。我们为 gPBWT 构建和查询操作提供有效的算法。作为演示,我们使用 gPBWT 快速计算与基因组图中随机游走一致的单倍型数量,以及映射读取所采取的路径;结果表明,单倍型一致性信息实际上可以合并到基于图的读取映射器中。我们估计,通过 gPBWT,可以存储 100,000 个二倍体基因组(包括所有形式的结构变异),并使用单个大型计算节点进行单倍型查询的搜索。
更新日期:2019-11-01
down
wechat
bug