当前位置: X-MOL 学术Theor. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Space-efficient construction of compressed suffix trees
Theoretical Computer Science ( IF 1.1 ) Pub Date : 2020-11-19 , DOI: 10.1016/j.tcs.2020.11.024
Nicola Prezza , Giovanna Rosone

We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let n be the text length and σ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in O(nlogσ) time using just o(nlogσ) bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter 0<ϵ1 we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in O(n(logσ+ϵ1loglogn)) time using at most nlogσ(ϵ+o(1)) bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. o(nlogσ) bits) and Θ(nlogσ+n(loglogn)1+δ) time, for any constant δ>0. This improves the previous most space-efficient algorithms, which worked in O(n) bits and O(nlogn) time. We also consider the problem of merging BWTs of string collections, and provide a solution running in O(nlogσ) time and using just o(nlogσ) bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as n bits on top of a packed representation of the input/output and process data as fast as 2.92 megabases per second.



中文翻译:

压缩后缀树的空间高效构造

我们展示了如何通过将Burrows-Wheeler变换(BWT)作为输入并使用较小的额外工作空间来构建对字符串处理至关重要的几个数据结构。令n为文本长度,σ为字母大小。我们首先提供两种算法,它们枚举了所有LCP值和后缀树间隔Øñ日志σ 只是使用时间 Øñ日志σ输入可重写BWT顶部的工作空间。将这些算法用作任何参数的构建块0<ϵ1个 我们展示了如何构建PLCP位向量和后缀树拓扑的平衡括号表示形式, Øñ日志σ+ϵ-1个日志日志ñ 最多使用时间 ñ日志σϵ+Ø1个输入可重写BWT和输出顶部的工作空间位。例如,我们可以仅使用简洁的工作空间从BWT构建压缩后缀树(即Øñ日志σ 位)和 Θñ日志σ+ñ日志日志ñ1个+δ 时间,对于任何常数 δ>0。这改进了以前最节省空间的算法,Øñ 位和 Øñ日志ñ时间。我们还考虑了合并字符串集合的BWT的问题,并提供了在Øñ日志σ 时间和使用 Øñ日志σ的工作空间。我们的LCP构造和BWT合并算法的有效实现在输入/输出和过程数据的打包表示的顶部(在RAM中)使用了仅n位,速度高达每秒2.92 megabase。

更新日期:2020-12-13
down
wechat
bug