当前位置: X-MOL 学术Algorithmica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Internal Dictionary Matching
Algorithmica ( IF 1.1 ) Pub Date : 2021-04-17 , DOI: 10.1007/s00453-021-00821-y
Panagiotis Charalampopoulos , Tomasz Kociumaka , Manal Mohamed , Jakub Radoszewski , Wojciech Rytter , Tomasz Waleń

We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary \(\mathsf {D}\) in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in \(\mathsf {D}\) is given as a fragment of T. This way, \(\mathsf {D}\) takes space proportional to the number of patterns \(d=|\mathsf {D}|\) rather than their total length, which could be \(\varTheta (n\cdot d)\). In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from \(\mathsf {D}\) in a fragment \(T[i \mathinner {.\,.}j]\) and reporting distinct patterns from \(\mathsf {D}\) that occur in \(T[i \mathinner {.\,.}j]\). We show how to construct, in \(O((n+d) \log ^{O(1)} n)\) time, a data structure that answers each of these queries in time \(O(\log ^{O(1)} n+| output |)\). The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight—up to subpolynomial factors—upper and lower bounds for the case of a dynamic dictionary.



中文翻译:

内部字典匹配

我们引入数据结构,回答有关给定字符串T的长度为n的片段中给定字典\(\ mathsf {D} \)中模式出现的查询的查询。字典是内部的,因为\(\ mathsf {D} \)中的每个模式都作为T的片段给出。这样,\(\ mathsf {D} \)占用的空间与图案的数量\(d = | \ mathsf {D} | \)而不是它们的总长度成正比,后者可能是\(\ varTheta(n \ cdot d)\)。特别是,我们考虑以下类型的查询:报告和计数所有的图案出现从\(\ mathsf {d} \)中的片段\(T [I \ mathinner {\ ,.} j]的\)和报告不同从图案\(\ mathsf {d} \)发生在\(T [i \ mathinner {。\ ,.} j] \)。我们展示了如何在\(O((n + d)\ log ^ {O(1)} n)\)的时间内构造一个数据结构,以在\(O(\ log ^ { O(1)} n + |输出|)\)。计数模式的情况更加复杂,需要将局部一致的解析与正交范围搜索相结合。另一方面,报告不同的模式使用字符串中最大重复次数的结构。最后,对于动态字典,我们为上下多项式提供了严格的上限和下限。

更新日期:2021-04-18
down
wechat
bug