当前位置: X-MOL 学术Theor. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximate pattern matching on elastic-degenerate text
Theoretical Computer Science ( IF 1.1 ) Pub Date : 2019-08-08 , DOI: 10.1016/j.tcs.2019.08.012
Giulia Bernardini , Nadia Pisanti , Solon P. Pissis , Giovanna Rosone

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g., pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists a non-combinatorial O(nm1.381+N)-time algorithm to solve this problem on-line [1]. In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for solving the problem under Hamming distance.



中文翻译:

弹性简并文本上的近似模式匹配

弹性退化的字符串是n个总长度为N的字符串集的序列。已经引入它来紧凑地表示几个紧密相关的序列(例如,泛基因组)的多重比对。在此表示形式中,这些完全匹配的序列的子字符串被折叠,而在序列不同的位置,列出了在该位置观察到的所有可能的变体。出现的自然问题是在弹性退化的文本中找到长度为m的确定性模式的所有匹配项。存在非组合Øñ1.381+ñ在线算法来解决这个问题[1]。在本文中,我们将在编辑距离模型下研究相同的问题,并提出Øķ2G+ķñ-时间和 Ø-space算法,其中G是弹性简并文本中的字符串总数,k是允许的最大编辑距离。我们还提出一个简单的ØķG+ķñ-时间和 Ø解决汉明距离问题的空间算法。

更新日期:2019-08-08
down
wechat
bug