Information Processing Letters ( IF 0.7 ) Pub Date : 2016-10-27 , DOI: 10.1016/j.ipl.2016.10.003 Marius Nicolae 1 , Sanguthevar Rajasekaran 1
We consider the problem of pattern matching with k mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern P of length m and a text T of length n, we want to find all occurrences of P in T that have no more than k mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorithm for pattern matching with k mismatches has a runtime of . With don't cares in the pattern, the best deterministic algorithm has a runtime of . Therefore, there is an important gap between the versions with and without don't cares.
In this paper we give an algorithm whose runtime increases with the number of don't cares. We define an island to be a maximal length substring of P that does not contain don't cares. Let q be the number of islands in P. We present an algorithm that runs in time. If the number of islands q is this runtime becomes , which essentially matches the best known runtime for pattern matching with k mismatches without don't cares. If the number of islands q is , this algorithm is asymptotically faster than the previous best algorithm for pattern matching with k mismatches with don't cares in the pattern.
中文翻译:
在模式匹配中,k个不匹配,很少有人在乎。
我们考虑了k个不匹配的模式匹配问题,其中模式中可能存在无关位或通配符。具体地,给出的图案P的长度米和文本牛逼的长度ñ,我们要找出所有出现的P在牛逼有不超过ķ不匹配。模式可以包含与任何字符匹配的无关字符。不用担心,最著名的k匹配模式匹配算法的运行时间为。不管模式如何,最佳确定性算法的运行时间为。因此,有无无关项的版本之间存在重要的差距。
在本文中,我们给出了一种算法,其运行时间随“无关”的次数而增加。我们将岛定义为P的最大长度子串,其中不包含无所谓。令q为P中的岛数。我们提出了一种在时间。如果岛数q是 这个运行时变成 ,这基本上与最著名的运行时匹配,可以匹配模式k个不匹配的内容,而无需关心。如果岛数q是,此算法比以前的最佳算法(渐进式快)快,该算法用于k个不匹配且模式无关的模式匹配。