当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Small space and streaming pattern matching with k edits
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-06-10 , DOI: arxiv-2106.06037
Tomasz Kociumaka, Ely Porat, Tatiana Starikovskaya

In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer $k$, a pattern $P$ of length $m$, and a text $T$ of length $n \ge m$, the task is to find substrings of $T$ that are within edit distance $k$ from $P$. Our main result is a streaming algorithm that solves the problem in $\tilde{O}(k^5)$ space and $\tilde{O}(k^8)$ amortised time per character of the text, providing answers correct with high probability. (Hereafter, $\tilde{O}(\cdot)$ hides a $\mathrm{poly}(\log n)$ factor.) This answers a decade-old question: since the discovery of a $\mathrm{poly}(k\log n)$-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no $\mathrm{poly}(k\log n)$-space algorithm was known even in the simpler semi-streaming model, where $T$ comes as a stream but $P$ is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers $n\ge k$. For any string of length at most $n$, the sketch is of size $\tilde{O}(k^2)$ and it can be computed with an $\tilde{O}(k^2)$-space streaming algorithm. Given the sketches of two strings, in $\tilde{O}(k^3)$ time we can compute their edit distance or certify that it is larger than $k$. This result improves upon $\tilde{O}(k^8)$-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent $\tilde{O}(k^3)$-size sketches of Jin, Nelson, and Wu [STACS 2021].

中文翻译:

小空间和流模式匹配与 k 个编辑

在这项工作中,我们重新审视了编辑距离下近似模式匹配的基本和经过充分研究的问题。给定一个整数 $k$、一个长度为 $m$ 的模式 $P$ 和一个长度为 $n\ge m$ 的文本 $T$,任务是找到 $T$ 的在编辑距离 $k 内的子串$来自$P$。我们的主要结果是一个流算法,它解决了 $\tilde{O}(k^5)$ 空间和 $\tilde{O}(k^8)$ 文本每个字符的分摊时间的问题,提供正确的答案高概率。(此后,$\tilde{O}(\cdot)$ 隐藏了一个 $\mathrm{poly}(\log n)$ 因子。)这回答了一个十年前的问题:自从发现 $\mathrm{poly} Porat 和 Porat [FOCS 2009] 用于汉明距离下模式匹配的 (k\log n)$-空间流算法,编辑距离的类似结果的存在仍然开放。直到这项工作,即使在更简单的半流模型中,也没有已知的 $\mathrm{poly}(k\log n)$-space 算法,其中 $T$ 作为流出现,但 $P$ 可用于只读访问。在这个模型中,我们给出了一个确定性算法,它实现了稍微好一点的复杂性。为了开发完全流式算法,我们引入了一个由整数 $n\ge k$ 参数化的新编辑距离草图。对于长度最多为 $n$ 的任何字符串,草图的大小为 $\tilde{O}(k^2)$ 并且可以使用 $\tilde{O}(k^2)$-space 流计算算法。给定两个字符串的草图,在 $\tilde{O}(k^3)$ 时间内,我们可以计算它们的编辑距离或证明它大于 $k$。该结果改进了 Belazzougui 和 Zhu [FOCS 2016] 的 $\tilde{O}(k^8)$ 大小的草图以及 Jin、Nelson 的最近 $\tilde{O}(k^3)$ 大小的草图,和吴 [STACS 2021]。
更新日期:2021-06-14
down
wechat
bug