A Fast Randomized Algorithm for Finding the Maximal Common Subsequences,arXiv - CS - Computational Complexity

当前位置： X-MOL 学术 › arXiv.cs.CC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences
arXiv - CS - Computational Complexity Pub Date : 2020-09-07 , DOI: arxiv-2009.03352
Jin Cao and Dewei Zhong

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ strings is NP-hard, e.g., the computational complexity is exponential in $L$. In this paper, we develop a randomized algorithm, referred to as {\em Random-MCS}, for finding a random instance of Maximal Common Subsequence ($MCS$) of multiple strings. A common subsequence is {\em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$, and therefore is suitable for large $L$. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {\em Random-MCS} often yields a solution to $LCS$.

中文翻译：

一种寻找最大公共子序列的快速随机算法

查找 $L$ 多个字符串的公共子序列在生物信息学、计算语言学和信息检索领域有许多应用。一个众所周知的结果表明，为 $L$ 字符串找到最长公共子序列 (LCS) 是 NP 难的，例如，计算复杂度是 $L$ 的指数。在本文中，我们开发了一种称为 {\em Random-MCS} 的随机算法，用于查找多个字符串的最大公共子序列 ($MCS$) 的随机实例。如果在子序列中插入任何字符不再产生公共子序列，则公共子序列是 {\em maximal}。MCS 的一个特例是长度最长的 LCS。我们展示了我们算法的复杂度在 $L$ 中是线性的，因此适用于大 $L$。此外，

更新日期：2020-09-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文