当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hidden Words Statistics for Large Patterns
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-03-21 , DOI: arxiv-2003.09584
Svante Janson and Wojciech Szpankowski

We study here the so called subsequence pattern matching also known as hidden pattern matching in which one searches for a given pattern $w$ of length $m$ as a subsequence in a random text of length $n$. The quantity of interest is the number of occurrences of $w$ as a subsequence (i.e., occurring in not necessarily consecutive text locations). This problem finds many applications from intrusion detection, to trace reconstruction, to deletion channel, and to DNA-based storage systems. In all of these applications, the pattern $w$ is of variable length. To the best of our knowledge this problem was only tackled for a fixed length $m=O(1)$ [Flajolet, Szpankowski and Vall\'ee, 2006]. In our main result we prove that for $m=o(n^{1/3})$ the number of subsequence occurrences is normally distributed. In addition, we show that under some constraints on the structure of $w$ the asymptotic normality can be extended to $m=o(\sqrt{n})$. For a special pattern $w$ consisting of the same symbol, we indicate that for $m=o(n)$ the distribution of number of subsequences is either asymptotically normal or asymptotically log normal. We conjecture that this dichotomy is true for all patterns. We use Hoeffding's projection method for $U$-statistics to prove our findings.

中文翻译:

大模式的隐藏词统计

我们在这里研究所谓的子序列模式匹配,也称为隐藏模式匹配,其中搜索长度为 $m$ 的给定模式 $w$ 作为长度为 $n$ 的随机文本中的子序列。感兴趣的数量是 $w$ 作为子序列的出现次数(即,不一定出现在连续的文本位置)。这个问题在入侵检测、轨迹重建、删除通道和基于 DNA 的存储系统中都有很多应用。在所有这些应用程序中,模式 $w$ 都是可变长度的。据我们所知,这个问题只针对固定长度 $m=O(1)$ [Flajolet, Szpankowski and Vall\'ee, 2006] 得到解决。在我们的主要结果中,我们证明对于 $m=o(n^{1/3})$,子序列出现的次数是正态分布的。此外,我们表明,在 $w$ 结构的一些约束下,渐近正态性可以扩展到 $m=o(\sqrt{n})$。对于由相同符号组成的特殊模式$w$,我们指出对于$m=o(n)$,子序列数的分布要么是渐近正态的,要么是渐近正态的。我们推测这种二分法适用于所有模式。我们使用 Hoeffding 的 $U$-statistics 投影方法来证明我们的发现。
更新日期:2020-03-24
down
wechat
bug