当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Practical Random Access to SLP-Compressed Texts
arXiv - CS - Data Structures and Algorithms Pub Date : 2019-10-16 , DOI: arxiv-1910.07145
Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Louisa Seelbach Benkner and Yoshimasa Takabatake

Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our attention to one of the features that make grammar-based compression so attractive: the possibility of supporting fast random access. This is an essential primitive in many algorithms that process grammar-compressed texts without decompressing them and so many theoretical bounds have been published about it, but experimentation has lagged behind. We give a new encoding of grammars that is about as small as the practical state of the art (Maruyama et al., SPIRE 2013) but with significantly faster queries.

中文翻译:

对 SLP 压缩文本的实际随机访问

基于语法的压缩是一种流行且强大的压缩重复文本的方法,但直到最近,它在现实生活中相对较差的时空权衡使得它对于真正庞大的数据集(如基因组数据库)不切实际。在最近的一篇论文 (SPIRE 2019) 中,我们展示了简单的预处理如何显着改善这些权衡,在本文中,我们将注意力转向了使基于语法的压缩如此有吸引力的特征之一:支持快速随机访问。这是许多算法中的基本原语,这些算法处理语法压缩的文本而不对其进行解压缩,并且已经发布了许多关于它的理论界限,但实验已经落后了。我们给出了一种新的语法编码,它与现有技术的实际状态一样小(Maruyama 等人,
更新日期:2020-07-21
down
wechat
bug