当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Succinct suffix sorting in external memory
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-09-09 , DOI: 10.1016/j.ipm.2020.102378
Ling Bo Han , Yi Wu , Ge Nong

Given a size-N input string X, a number of algorithms have been proposed to sort the suffixes of X into the output suffix array using the inducing methods. While the existing algorithms eSAIS, DSAIS, and fSAIS presented remarkable time and space results for suffix sorting in external memory, there are still potentials for further improvements. We propose here a new algorithm called nSAIS by reinventing the core inducing procedure in DSAIS with a new set of data structures for running faster and using less space. The suffix array is computed recursively and the inducing procedure on each recursion level is performed block by block to facilitate sequential I/Os. If X has a byte-alphabet and N=O(M2/B), where M and B are the sizes of internal memory and I/O block, respectively, nSAIS guarantees a workspace less than N bytes besides input and output while keeping the linear I/O volume O(N) which is the best known so far for external-memory inducing methods. Our experiments on typical settings show that, our program for nSAIS with 40-bit integers not only runs faster than the existing representative external memory algorithms when N keeps growing, but also always uses the least disk space around 6.1 bytes on average. The techniques proposed by this study can be utilized to develop fast and succinct suffix sorters in external memory.



中文翻译:

外部存储器中的简洁后缀排序

给定大小为N的输入字符串X,已提出了许多算法,可以使用归纳方法将X的后缀分类到输出后缀数组中。尽管现有算法eSAIS,DSAIS和fSAIS为外部存储器中的后缀排序提供了显着的时间和空间结果,但仍有进一步改进的潜力。我们在这里提出了一种称为nSAIS的新算法,它通过使用一组新的数据结构重新创建了DSAIS中的核心诱导过程,从而可以更快地运行并使用更少的空间。后缀数组是递归计算的,并且每个递归级别上的归纳过程都是逐块执行的,以利于顺序I / O。如果X具有字节字母和ñ=Ø中号2/其中MB分别是内部存储器和I / O块的大小,nSAIS保证除了输入和输出外,工作空间还小于N个字节,同时保持线性I / O量Øñ这是迄今为止外部存储器诱导方法中最著名的。我们对典型设置的实验表明,当N保持增长时,我们的40位整数nSAIS程序不仅比现有的代表性外部存储器算法运行得更快,而且始终平均使用最少的磁盘空间(大约6.1字节)。这项研究提出的技术可用于开发外部存储器中的快速精简后缀分类器。

更新日期:2020-09-10
down
wechat
bug