Grammar compression with probabilistic context-free grammar,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Grammar compression with probabilistic context-free grammar
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-03-18 , DOI: arxiv-2003.08097
Hiroaki Naganuma, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara, Naoki Kobayashi

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

中文翻译：

使用概率上下文无关语法进行语法压缩

我们提出了一种基于语法压缩的通用无损文本压缩新方法。在文献中，目标字符串 $T$ 已被压缩为满足 $L(G) = \{T\}$ 的乔姆斯基范式中的上下文无关文法 $G$。这种文法通常被称为 \emph{straight-line program} (SLP)。在本文中，我们考虑生成 $T$ 的概率文法 $G$，但不一定是 $L(G)$ 的唯一元素。为了明确地恢复原始文本 $T$，我们将语法 $G$ 和 $T$ 的派生树以压缩形式保留在 $G$ 中的起始符号中。我们展示了一些简单的证据，从理论和实践的角度来看，我们的提议确实比某些文本的 SLP 更有效。

更新日期：2020-03-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>