当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
q-frame hash comparison based exact string matching algorithms for DNA sequences
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-07-26 , DOI: 10.1002/cpe.6505
Abdullah Ammar Karcioglu 1 , Hasan Bulut 1
Affiliation  

The importance of string matching is due to its applications in many fields, such as medicine and bioinformatics. Various string matching algorithms are developed to speed up the search. Especially, hash-based exact string matching algorithms are among the most time-efficient ones. The efficiency of hash-based approaches depends on the hash function. Hence, perfect hashing plays an essential role in hash-based string matching. In this study, two q-frame hash comparison-based exact string matching algorithms, Hq-QF and HqBM-QF, are proposed. We have used a collision-free perfect hash function for DNA sequences in the proposed algorithms. In the first approach, after hash values match for the last qcharacters, the character comparisons in the Hash-q algorithm are replaced with q-frame hash comparison. In the second approach, we improved the first approach by utilizing the shift size indicated at the urn:x-wiley:cpe:media:cpe6505:cpe6505-math-0001th entry in the good suffix shift table. Since the number of character comparisons is minimized, the worst-case time complexity of the proposed algorithms is urn:x-wiley:cpe:media:cpe6505:cpe6505-math-0002. In both approaches, q-frame hash comparisons replace most character comparisons as a trade-off. The results show that the proposed approaches are more efficient than the Hash-q algorithm in terms of runtime efficiency and the number of character comparisons.

中文翻译:

基于 q-frame 哈希比较的 DNA 序列精确字符串匹配算法

字符串匹配的重要性在于它在许多领域的应用,例如医学和生物信息学。开发了各种字符串匹配算法来加快搜索速度。特别是,基于哈希的精确字符串匹配算法是最省时的算法之一。基于散列的方法的效率取决于散列函数。因此,完美哈希在基于哈希的字符串匹配中起着至关重要的作用。在这项研究中,提出了两种基于q帧哈希比较的精确字符串匹配算法,Hq-QF 和 HqBM-QF。我们在所提出的算法中对 DNA 序列使用了无碰撞完美散列函数。在第一种方法中,在最后q个字符的哈希值匹配后,将 Hash-q 算法中的字符比较替换为q - 帧哈希比较。在第二种方法中,我们通过利用骨灰盒:x-wiley:cpe:媒体:cpe6505:cpe6505-math-0001好后缀移位表中第 th 项指示的移位大小来改进第一种方法。由于字符比较的次数被最小化,所提出算法的最坏情况时间复杂度为骨灰盒:x-wiley:cpe:媒体:cpe6505:cpe6505-math-0002. 在这两种方法中,q帧哈希比较替换大多数字符比较作为权衡。结果表明,在运行效率和字符比较次数方面,所提出的方法比 Hash-q 算法更有效。
更新日期:2021-07-26
down
wechat
bug