On pattern matching with k mismatches and few don't cares.,Information Processing Letters

当前位置： X-MOL 学术 › Inf. Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On pattern matching with k mismatches and few don't cares.
Information Processing Letters ( IF 0.7 ) Pub Date : 2016-10-27 , DOI: 10.1016/j.ipl.2016.10.003
Marius Nicolae ₁ , Sanguthevar Rajasekaran ₁

Affiliation

We consider the problem of pattern matching with k mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern P of length m and a text T of length n, we want to find all occurrences of P in T that have no more than k mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorithm for pattern matching with k mismatches has a runtime of $O (n \sqrt{k \log k})$ . With don't cares in the pattern, the best deterministic algorithm has a runtime of $O (n k polylog m)$ . Therefore, there is an important gap between the versions with and without don't cares.

In this paper we give an algorithm whose runtime increases with the number of don't cares. We define an island to be a maximal length substring of P that does not contain don't cares. Let q be the number of islands in P. We present an algorithm that runs in $O (n \sqrt{k \log m} + n \min {\sqrt[3]{q k \log^{2} m}, \sqrt{q \log m}})$ time. If the number of islands q is $O (k)$ this runtime becomes $O (n \sqrt{k \log m})$ , which essentially matches the best known runtime for pattern matching with k mismatches without don't cares. If the number of islands q is $O (k^{2})$ , this algorithm is asymptotically faster than the previous best algorithm for pattern matching with k mismatches with don't cares in the pattern.

中文翻译：

在模式匹配中，k个不匹配，很少有人在乎。

我们考虑了k个不匹配的模式匹配问题，其中模式中可能存在无关位或通配符。具体地，给出的图案P的长度米和文本牛逼的长度ñ，我们要找出所有出现的P在牛逼有不超过ķ不匹配。模式可以包含与任何字符匹配的无关字符。不用担心，最著名的k匹配模式匹配算法的运行时间为 $Ø （ ñ \sqrt{ķ 日志 ķ} ）$ 。不管模式如何，最佳确定性算法的运行时间为 $Ø （ ñ ķ 多对数米）$ 。因此，有无无关项的版本之间存在重要的差距。

在本文中，我们给出了一种算法，其运行时间随“无关”的次数而增加。我们将岛定义为P的最大长度子串，其中不包含无所谓。令q为P中的岛数。我们提出了一种在 $Ø （ ñ \sqrt{ķ 日志米} + ñ 分 {\sqrt[3]{q ķ {日志}^{2} 米} ， \sqrt{q 日志米}} ）$ 时间。如果岛数q是 $Ø （ ķ ）$ 这个运行时变成 $Ø （ ñ \sqrt{ķ 日志米} ）$ ，这基本上与最著名的运行时匹配，可以匹配模式k个不匹配的内容，而无需关心。如果岛数q是 $Ø （ ķ^{2} ）$ ，此算法比以前的最佳算法（渐进式快）快，该算法用于k个不匹配且模式无关的模式匹配。

更新日期：2016-10-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11