当前位置: X-MOL 学术Algorithmica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Online String Matching Based on Characters Distance Text Sampling
Algorithmica ( IF 0.9 ) Pub Date : 2020-06-20 , DOI: 10.1007/s00453-020-00732-4
Simone Faro , Francesco Pio Marino , Arianna Pavone

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string matching is an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a given pivot character and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 to 2.8% of the text size, with a gain up to 50% if compared with previous solutions.

中文翻译:

基于字符距离文本采样的高效在线字符串匹配

搜索文本中所有出现的模式是计算机科学中的一个基本问题,在许多其他领域都有应用,如自然语言处理、信息检索和计算生物学。采样字符串匹配是最近引入的一种有效方法,一方面是为了克服索引构建的过高空间要求,另一方面是大幅减少在线解决方案的搜索时间。在本文中,我们提出了一种基于字符距离采样方法的采样字符串匹配问题的新算法。主要思想是对给定枢轴字符的连续出现之间的距离进行采样,然后在验证原始文本之前在线搜索采样数据以查找采样模式的任何出现。从理论的角度我们证明,在合适的条件下,我们的解决方案可以同时实现线性最坏情况时间复杂度和最佳平均时间复杂度。从实际的角度来看,我们的解决方案在实践中表现出亚线性行为,并且使用有限的额外空间(其数量从文本大小的 11% 到 2.8%)将在线搜索速度提高了 9 倍,与以前的解决方案相比,增益高达 50%。
更新日期:2020-06-20
down
wechat
bug