当前位置: X-MOL 学术ACM Trans. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large-Alphabet Semi-Static Entropy Coding Via Asymmetric Numeral Systems
ACM Transactions on Information Systems ( IF 5.4 ) Pub Date : 2020-07-07 , DOI: 10.1145/3397175
Alistair Moffat 1 , Matthias Petri 1
Affiliation  

An entropy coder takes as input a sequence of symbol identifiers over some specified alphabet and represents that sequence as a bitstring using as few bits as possible, typically assuming that the elements of the sequence are independent of each other. Previous entropy coding methods include the well-known Huffman and arithmetic approaches. Here we examine the newer asymmetric numeral systems (ANS) technique for entropy coding and develop mechanisms that allow it to be efficiently used when the size of the source alphabet is large—thousands or millions of symbols. In particular, we examine different ways in which probability distributions over large alphabets can be approximated and in doing so infer techniques that allow the ANS mechanism to be extended to support large-alphabet entropy coding. As well as providing a full description of ANS, we also present detailed experiments using several different types of input, including data streams arising as typical output from the modeling stages of text compression software, and compare theproposed ANS variants with Huffman and arithmetic coding baselines, measuring both compression effectiveness and also encoding and decoding throughput. We demonstrate that in applications in which semi-static compression is appropriate, ANS-based coders can provide an excellent balance between compression effectiveness and speed, even when the alphabet is large.

中文翻译:

通过非对称数字系统的大字母半静态熵编码

熵编码器将某个指定字母表上的符号标识符序列作为输入,并将该序列表示为使用尽可能少的位的位串,通常假设序列的元素彼此独立。以前的熵编码方法包括众所周知的霍夫曼和算术方法。在这里,我们研究了用于熵编码的较新的非对称数字系统 (ANS) 技术,并开发了允许在源字母表的大小很大(数千或数百万个符号)时有效使用它的机制。特别是,我们研究了可以近似大字母表上的概率分布的不同方法,并以此推断允许扩展 ANS 机制以支持大字母表熵编码的技术。除了提供 ANS 的完整描述外,我们还使用几种不同类型的输入进行了详细的实验,包括作为文本压缩软件建模阶段典型输出的数据流,并将提出的 ANS 变体与 Huffman 和算术编码基线进行比较,测量压缩效率以及编码和解码吞吐量. 我们证明,在适合半静态压缩的应用程序中,基于 ANS 的编码器可以在压缩效果和速度之间提供出色的平衡,即使在字母表很大的情况下也是如此。测量压缩效率以及编码和解码吞吐量。我们证明,在适合半静态压缩的应用程序中,基于 ANS 的编码器可以在压缩效果和速度之间提供出色的平衡,即使在字母表很大的情况下也是如此。测量压缩效率以及编码和解码吞吐量。我们证明,在适合半静态压缩的应用程序中,基于 ANS 的编码器可以在压缩效果和速度之间提供出色的平衡,即使在字母表很大的情况下也是如此。
更新日期:2020-07-07
down
wechat
bug