当前位置: X-MOL 学术Algorithmica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
k-Approximate Quasiperiodicity Under Hamming and Edit Distance
Algorithmica ( IF 0.9 ) Pub Date : 2021-06-22 , DOI: 10.1007/s00453-021-00842-7
Aleksander Kędzierski , Jakub Radoszewski

Quasiperiodicity in strings was introduced almost 30 years ago as an extension of string periodicity. The basic notions of quasiperiodicity are cover and seed. A cover of a text T is a string whose occurrences in T cover all positions of T. A seed of text T is a cover of a superstring of T. In various applications exact quasiperiodicity is still not sufficient due to the presence of errors. We consider approximate notions of quasiperiodicity, for which we allow approximate occurrences in T with a small Hamming, Levenshtein or weighted edit distance. In previous work Sim et al. (J Korea Inf Sci Soc 29(1):16–21, 2002) and Christodoulakis et al. (J Autom Lang Comb 10(5/6), 609–626, 2005) showed that computing approximate covers and seeds, respectively, under weighted edit distance is NP-hard. They, therefore, considered restricted approximate covers and seeds which need to be factors of the original string T and presented polynomial-time algorithms for computing them. Further algorithms, considering approximate occurrences with Hamming distance bounded by k, were given in several contributions by Guth et al. They also studied relaxed approximate quasiperiods. We present more efficient algorithms for computing restricted approximate covers and seeds. In particular, we improve upon the complexities of many of the aforementioned algorithms, also for relaxed quasiperiods. Our solutions are especially efficient if the number (or total cost) of allowed errors is small. We also show conditional lower bounds for computing restricted approximate covers and prove NP-hardness of computing non-restricted approximate covers and seeds under the Hamming distance.



中文翻译:

汉明和编辑距离下的 k 近似准周期性

弦中的准周期性是大约 30 年前作为弦周期性的延伸而引入的。准周期性的基本概念是覆盖和种子。文本的封面牛逼是一个字符串,它出现在牛逼覆盖的所有位置牛逼。文本的种子Ť是一个超弦的盖Ť。在各种应用中,由于存在误差,精确的准周期性仍然不够。我们考虑准周期性的近似概念,为此我们允许在T 中近似出现具有较小的汉明、Levenshtein 或加权编辑距离。在之前的工作中 Sim 等人。(J Korea Inf Sci Soc 29(1):16–21, 2002) 和 Christooulakis 等人。(J Autom Lang Comb 10(5/6), 609–626, 2005) 表明,在加权编辑距离下分别计算近似覆盖和种子是 NP-hard 的。因此,他们考虑了需要作为原始字符串T 的因子的受限近似覆盖和种子,并提出了多项式时间算法来计算它们。进一步的算法,考虑以k 为界的汉明距离的近似出现,在 Guth 等人的一些贡献中给出。他们还研究了松弛近似准周期。我们提出了更有效的算法来计算受限近似覆盖和种子。特别是,我们改进了许多上述算法的复杂性,也适用于松弛准周期。如果允许错误的数量(或总成本)很小,我们的解决方案尤其有效。我们还展示了计算受限近似覆盖的条件下界,并证明了在汉明距离下计算非受限近似覆盖和种子的 NP 难度。

更新日期:2021-06-22
down
wechat
bug