当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Grammar compression with probabilistic context-free grammar
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-03-18 , DOI: arxiv-2003.08097
Hiroaki Naganuma, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara, Naoki Kobayashi

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

中文翻译:

使用概率上下文无关语法进行语法压缩

我们提出了一种基于语法压缩的通用无损文本压缩新方法。在文献中,目标字符串 $T$ 已被压缩为满足 $L(G) = \{T\}$ 的乔姆斯基范式中的上下文无关文法 $G$。这种文法通常被称为 \emph{straight-line program} (SLP)。在本文中,我们考虑生成 $T$ 的概率文法 $G$,但不一定是 $L(G)$ 的唯一元素。为了明确地恢复原始文本 $T$,我们将语法 $G$ 和 $T$ 的派生树以压缩形式保留在 $G$ 中的起始符号中。我们展示了一些简单的证据,从理论和实践的角度来看,我们的提议确实比某些文本的 SLP 更有效。
更新日期:2020-03-19
down
wechat
bug