当前位置: X-MOL 学术Inf. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Streaming k-mismatch with error correcting and applications
Information and Computation ( IF 1 ) Pub Date : 2020-01-03 , DOI: 10.1016/j.ic.2019.104513
Jakub Radoszewski , Tatiana Starikovskaya

We present a new streaming algorithm for the k-Mismatch problem, one of the most basic problems in pattern matching. Given a pattern and a text, the task is to find all substrings of the text that are at the Hamming distance at most k from the pattern. Our algorithm is enhanced with an important new feature called Error Correcting, and its complexities for k=1 and for a general k are comparable to those of the solutions for the k-Mismatch problem by Porat and Porat (FOCS 2009) and Clifford et al. (SODA 2016). In parallel to our research, a yet more efficient algorithm for the k-Mismatch problem with the Error Correcting feature was developed by Clifford et al. (SODA 2019). Using the new feature and recent work on streaming Multiple Pattern Matching we develop a series of streaming algorithms for pattern matching on weighted strings, which are a commonly used representation of uncertain sequences in molecular biology. We also show that these algorithms are space-optimal up to polylog factors.

A preliminary version of this work was published at DCC 2017 conference [24].



中文翻译:

k-不匹配,带有纠错和应用程序

本文提出了一种新的流媒体算法的ķ -不匹配的问题,在模式匹配的最基本的问题之一。给定一个模式和一个文本,任务是找到距离该模式最多汉明距离为k的文本的所有子字符串。我们的算法通过一项重要的新功能-纠错功能得到了增强,它的复杂性ķ=1个和用于一般ķ比得上那些用于溶液的ķ -不匹配由波拉特和Porat的(FOCS 2009)和Clifford等问题。(SODA 2016)。与此同时我们的研究,对于一个尚未更高效的算法ķ -不匹配的问题与错误校正功能是由克利福德等人开发。(SODA 2019)。利用新功能和流式多模式匹配的最新工作,我们开发了一系列流式算法,用于加权字符串的模式匹配,这是分子生物学中不确定序列的常用表示形式。我们还表明,这些算法对于多对数因子都是空间最优的。

这项工作的初步版本已在DCC 2017会议上发布[24]。

更新日期:2020-01-03
down
wechat
bug