Practical Random Access to SLP-Compressed Texts,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Practical Random Access to SLP-Compressed Texts
arXiv - CS - Data Structures and Algorithms Pub Date : 2019-10-16 , DOI: arxiv-1910.07145
Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Louisa Seelbach Benkner and Yoshimasa Takabatake

Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our attention to one of the features that make grammar-based compression so attractive: the possibility of supporting fast random access. This is an essential primitive in many algorithms that process grammar-compressed texts without decompressing them and so many theoretical bounds have been published about it, but experimentation has lagged behind. We give a new encoding of grammars that is about as small as the practical state of the art (Maruyama et al., SPIRE 2013) but with significantly faster queries.

中文翻译：

对 SLP 压缩文本的实际随机访问

基于语法的压缩是一种流行且强大的压缩重复文本的方法，但直到最近，它在现实生活中相对较差的时空权衡使得它对于真正庞大的数据集（如基因组数据库）不切实际。在最近的一篇论文 (SPIRE 2019) 中，我们展示了简单的预处理如何显着改善这些权衡，在本文中，我们将注意力转向了使基于语法的压缩如此有吸引力的特征之一：支持快速随机访问。这是许多算法中的基本原语，这些算法处理语法压缩的文本而不对其进行解压缩，并且已经发布了许多关于它的理论界限，但实验已经落后了。我们给出了一种新的语法编码，它与现有技术的实际状态一样小（Maruyama 等人，

更新日期：2020-07-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>