Relative Suffix Trees,The Computer Journal

当前位置： X-MOL 学术 › Comput. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Relative Suffix Trees
The Computer Journal ( IF 1.5 ) Pub Date : 2017-11-21 , DOI: 10.1093/comjnl/bxx108
Andrea Farruggia ₁ , Travis Gagie _{2,

3} , Gonzalo Navarro _{2,

4} , Simon J Puglisi ₅ , Jouni Sirén ₆

Affiliation

Abstract Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while using space proportional to a compressed representation of the sequence. In this work, we take a new approach to compressed suffix trees for repetitive sequence collections, such as collections of individual genomes. We compress the suffix trees of individual sequences relative to the suffix tree of a reference sequence. These relative data structures provide competitive time/space trade-offs, being almost as small as the smallest compressed suffix trees for repetitive collections, and competitive in time with the largest and fastest compressed suffix trees.

中文翻译：

相对后缀树

摘要后缀树是字符串学中最通用的数据结构之一，在生物信息学中有许多应用。它们的主要缺点是它们的大小，可能比输入序列大数十倍。为了减少空间使用，我们付出了很多努力，最终导致了压缩后缀树。这些压缩数据结构可以有效地模拟后缀树，同时使用与序列的压缩表示成比例的空间。在这项工作中，我们采用了一种新方法来压缩后缀树，用于重复序列集合，例如单个基因组的集合。我们相对于参考序列的后缀树压缩各个序列的后缀树。这些相关数据结构提供了有竞争力的时间/空间权衡，几乎与用于重复集合的最小压缩后缀树一样小，并且在时间上与最大和最快的压缩后缀树具有竞争力。

更新日期：2017-11-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文