Theoretical Computer Science ( IF 0.747 ) Pub Date : 2020-11-20 , DOI: 10.1016/j.tcs.2020.11.036
Arnab Ganguly; Wing-Kai Hon; Kunihiko Sadakane; Rahul Shah; Sharma V. Thankachan; Yilin Yang

Let $\mathcal{P}$ be a collection of d patterns $\left\{{\mathsf{P}}_{1},{\mathsf{P}}_{2},\dots ,{\mathsf{P}}_{d}\right\}$ of total length n characters, which are chosen from an alphabet Σ of size σ. Given a text T (over Σ), the dictionary indexing problem is to create a data structure using which we can report all positions j (called occurrences) where at least one of the patterns ${\mathsf{P}}_{i}\in \mathcal{P}$ is a match with the same-length substring of T that starts at j. We consider this problem under the following definitions of matching.

Parameterized Matching: The characters of Σ are partitioned into static characters and parameterized characters. Two equal length strings S and ${S}^{\prime }$ are a parameterized match iff the static characters match exactly, and there exists a one-to-one function which renames the parameterized characters in S to those in ${S}^{\prime }$.

Order-Preserving Matching: The alphabet Σ is ordered. Two equal length strings S and ${S}^{\prime }$ are an order-preserving match iff for any two integers $i,j\in \left[1,|S|\right]$, $S\left[i\right]\prec S\left[j\right]⇔{S}^{\prime }\left[i\right]\prec {S}^{\prime }\left[j\right]$, where ≺ denotes the precedence order in Σ.

Let $\epsilon >0$ be an arbitrarily small constant. For parameterized matching, we first present a compact $O\left(n\mathrm{log}\sigma +d\mathrm{log}n\right)$-bit index that reports all occ occurrences in $O\left(|T|\left(\mathrm{log}\sigma +{\mathrm{log}}_{\sigma }n\right)+occ\right)$ time, and then a succinct $n\mathrm{log}\sigma +o\left(n\mathrm{log}\sigma \right)+O\left(d\mathrm{log}n\right)$-bit index that reports all occ occurrences in $O\left(|T|\left(\mathrm{log}\sigma +{\mathrm{log}}^{\epsilon }n{\mathrm{log}}_{\sigma }n\right)+occ\right)$ time. For order-preserving matching, we present indexes of the same sizes, but with slightly increased query time.

down
wechat
bug