当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Finding Longest Common Subsequences: New anytime A∗ search results
Applied Soft Computing ( IF 8.7 ) Pub Date : 2020-06-26 , DOI: 10.1016/j.asoc.2020.106499
Marko Djukanovic , Günther R. Raidl , Christian Blum

The Longest Common Subsequence (LCS) problem aims at finding a longest string that is a subsequence of each string from a given set of input strings. This problem has applications, in particular, in the context of bioinformatics, where strings represent DNA or protein sequences. Existing approaches include numerous heuristics, but only a few exact approaches, limited to rather small problem instances. Adopting various aspects from leading heuristics for the LCS, we first propose an exact A search approach, which performs well in comparison to earlier exact approaches in the context of small instances. On the basis of A search we then develop two hybrid A–based algorithms in which classical A iterations are alternated with beam search and anytime column search, respectively. A key feature to guide the heuristic search in these approaches is the usage of an approximate expected length calculation for the LCS of uniform random strings. Even for large problem instances these anytime A variants yield reasonable solutions early during the search and improve on them over time. Moreover, they terminate with proven optimality if enough time and memory is given. Furthermore, they yield upper bounds and, thus, quality guarantees when terminated early. We comprehensively evaluate the proposed methods using most of the available benchmark sets from the literature and compare to the current state-of-the-art methods. In particular, our algorithms are able to obtain new best results for 82 out of 117 instance groups. Moreover, in most cases they also provide significantly smaller optimality gaps than other anytime algorithms.



中文翻译:

寻找最长的公共子序列:随时新 搜索结果

最长公共子序列(LCS)问题旨在找到最长字符串,该字符串是从给定的一组输入字符串中每个字符串的子序列。这个问题特别是在生物信息学中有应用,其中字符串代表DNA或蛋白质序列。现有的方法包括许多启发式方法,但是只有少数几种确切的方法,仅限于相当小的问题实例。在LCS中采用领先启发式方法的各个方面,我们首先提出一个精确的A搜索方法,与小实例情况下的早期精确方法相比,它的效果很好。在A的基础上 搜索,然后开发两个混合A的算法,其中经典A迭代分别与波束搜索和随时列搜索交替进行。在这些方法中指导启发式搜索的关键功能是对均匀随机字符串的LCS使用近似预期长度计算。即使对于大问题实例,这些随时都可以变体会在搜索的早期阶段提供合理的解决方案,并随着时间的推移对其进行改进。此外,如果有足够的时间和内存,它们会以经过验证的最优性终止。此外,它们产生了上限,因此提早终止质量保证。我们使用文献中的大多数可用基准集全面评估了所提出的方法,并与当前的最新方法进行了比较。特别是,我们的算法能够在117个实例组中的82个中获得新的最佳结果。而且,在大多数情况下,它们还提供了比其他任何时间算法都小的得多的最佳差距。

更新日期:2020-06-26
down
wechat
bug