当前位置: X-MOL 学术Intell. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Suffix array for multi-pattern matching with variable length wildcards
Intelligent Data Analysis ( IF 0.9 ) Pub Date : 2021-03-04 , DOI: 10.3233/ida-205087
Na Liu 1, 2, 3 , Fei Xie 4 , Xindong Wu 1, 2, 5
Affiliation  

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

中文翻译:

后缀数组,用于具有可变长度通配符的多模式匹配

当模式包含变长通配符时,近似多模式匹配是一个广泛且频繁使用的重要问题。本文提出了两种基于后缀数组的算法来解决该问题。后缀数组是一种有效的数据结构,用于现有研究中的精确字符串匹配以及近似模式匹配和多模式匹配。称为MMSA-S的算法用于通过动态编程来处理图案中的短精确字符,而另一种称为MMSA-L的算法则通过编辑距离方法来处理较长的精确字符。Pizza&Chili语料库的实验结果表明,在大多数情况下,这两种新提出的算法比最新的比较算法更省时。
更新日期:2021-03-09
down
wechat
bug