Theoretical Computer Science ( IF 1.1 ) Pub Date : 2020-11-19 , DOI: 10.1016/j.tcs.2020.11.024 Nicola Prezza , Giovanna Rosone
We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let n be the text length and σ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in time using just bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in time using at most bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. bits) and time, for any constant . This improves the previous most space-efficient algorithms, which worked in bits and time. We also consider the problem of merging BWTs of string collections, and provide a solution running in time and using just bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as n bits on top of a packed representation of the input/output and process data as fast as 2.92 megabases per second.
中文翻译:
压缩后缀树的空间高效构造
我们展示了如何通过将Burrows-Wheeler变换(BWT)作为输入并使用较小的额外工作空间来构建对字符串处理至关重要的几个数据结构。令n为文本长度,σ为字母大小。我们首先提供两种算法,它们枚举了所有LCP值和后缀树间隔 只是使用时间 输入可重写BWT顶部的工作空间。将这些算法用作任何参数的构建块 我们展示了如何构建PLCP位向量和后缀树拓扑的平衡括号表示形式, 最多使用时间 输入可重写BWT和输出顶部的工作空间位。例如,我们可以仅使用简洁的工作空间从BWT构建压缩后缀树(即 位)和 时间,对于任何常数 。这改进了以前最节省空间的算法, 位和 时间。我们还考虑了合并字符串集合的BWT的问题,并提供了在 时间和使用 的工作空间。我们的LCP构造和BWT合并算法的有效实现在输入/输出和过程数据的打包表示的顶部(在RAM中)使用了仅n位,速度高达每秒2.92 megabase。