Theoretical Computer Science ( IF 0.9 ) Pub Date : 2020-12-10 , DOI: 10.1016/j.tcs.2020.12.006 Sahar Hooshmand , Paniz Abedin , M. Oğuzhan Külekci , Sharma V. Thankachan
The non-overlapping indexing problem is defined as follows: pre-process a given text of length n into a data structure such that whenever a pattern comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in . The best-known solution is by Cohen and Porat [ISAAC 2009]. The size of their structure is words and the query time is optimal , where is the output size. Later, Ganguly et al. [CPM 2015 and Algorithmica 2020] proposed a compressed space solution. We study this problem in the cache-oblivious model and present a new data structure of size words. It can answer queries in optimal I/O operations, where B is the block size. The space can be improved to in the cache-aware model, where M is the size of main memory. Additionally, we study a generalization of this problem with an additional range constraint. Here the task is to report the largest set of non-overlapping occurrences of P in , that are within the range . We present an space data structure in the cache-aware model that can answer queries in optimal I/O operations, where is the output size.
中文翻译:
I / O高效的数据结构,用于非重叠索引
非重叠索引问题定义如下:预处理给定文本 长度为n的数据结构作为输入,我们可以有效地报告P中最大的一组非重叠出现。最著名的解决方案是Cohen和Porat [ISAAC 2009]。其结构的大小是 单词和查询时间最佳 ,在哪里 是输出大小。后来,Ganguly等人。[CPM 2015和Algorithmica 2020]提出了一种压缩空间解决方案。我们在忽略缓存的模型中研究了此问题,并提出了一个新的大小数据结构话。它可以最佳地回答查询I / O操作,其中B是块大小。空间可以改善为在支持缓存的模型中,其中M是主内存的大小。此外,我们还研究了此问题的一般性约束。这里的任务是报告P in中最大的一组非重叠出现,在范围内 。我们提出一个 缓存感知模型中的空间数据结构,可以以最佳方式回答查询 I / O操作 是输出大小。