当前位置: X-MOL 学术Theor. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
I/O-efficient data structures for non-overlapping indexing
Theoretical Computer Science ( IF 0.9 ) Pub Date : 2020-12-10 , DOI: 10.1016/j.tcs.2020.12.006
Sahar Hooshmand , Paniz Abedin , M. Oğuzhan Külekci , Sharma V. Thankachan

The non-overlapping indexing problem is defined as follows: pre-process a given text T[1,n] of length n into a data structure such that whenever a pattern P[1,m] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best-known solution is by Cohen and Porat [ISAAC 2009]. The size of their structure is O(n) words and the query time is optimal O(m+nocc), where nocc is the output size. Later, Ganguly et al. [CPM 2015 and Algorithmica 2020] proposed a compressed space solution. We study this problem in the cache-oblivious model and present a new data structure of size O(nlogn) words. It can answer queries in optimal O(mB+logBn+noccB) I/O operations, where B is the block size. The space can be improved to O(nlogM/Bn) in the cache-aware model, where M is the size of main memory. Additionally, we study a generalization of this problem with an additional range [s,e] constraint. Here the task is to report the largest set of non-overlapping occurrences of P in T, that are within the range [s,e]. We present an O(nlog2n) space data structure in the cache-aware model that can answer queries in optimal O(mB+logBn+nocc[s,e]B) I/O operations, where nocc[s,e] is the output size.



中文翻译:

I / O高效的数据结构,用于非重叠索引

非重叠索引问题定义如下:预处理给定文本 Ť[1个ñ]长度为n的数据结构P[1个]作为输入,我们可以有效地报告P中最大的一组非重叠出现Ť。最著名的解决方案是Cohen和Porat [ISAAC 2009]。其结构的大小是Øñ 单词和查询时间最佳 Ø+Nocc,在哪里 Nocc是输出大小。后来,Ganguly等人。[CPM 2015和Algorithmica 2020]提出了一种压缩空间解决方案。我们在忽略缓存的模型中研究了此问题,并提出了一个新的大小数据结构Øñ日志ñ话。它可以最佳地回答查询Ø+日志ñ+NoccI / O操作,其中B是块大小。空间可以改善为Øñ日志中号/ñ在支持缓存的模型中,其中M是主内存的大小。此外,我们还研究了此问题的一般性[sË]约束。这里的任务是报告P in中最大的一组非重叠出现Ť,在范围内 [sË]。我们提出一个Øñ日志2ñ 缓存感知模型中的空间数据结构,可以以最佳方式回答查询 Ø+日志ñ+Nocc[sË] I / O操作 Nocc[sË] 是输出大小。

更新日期:2021-01-22
down
wechat
bug