当前位置: X-MOL 学术Journal of Quantitative Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal Coding and the Origins of Zipfian Laws
Journal of Quantitative Linguistics ( IF 0.761 ) Pub Date : 2020-07-24 , DOI: 10.1080/09296174.2020.1778387
Ramon Ferrer-i-Cancho 1 , Christian Bentz 2, 3 , Caio Seguin 4
Affiliation  

ABSTRACT

The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.



中文翻译:

最优编码和 Zipfian 定律的起源

摘要

标准信息论中的压缩问题包括为数字分配尽可能短的代码。在这里,我们考虑最优编码问题——在任意编码方案下——并表明它预测了 Zipf 的缩写定律,即自然语言中更频繁的词更短的趋势。我们应用这个结果来研究所谓的非奇异编码下的最佳编码,这种方案不需要唯一的分割,但代码代表一个不同的数字。最优非奇异编码预测单词的长度应近似为其频率等级的对数增长,这再次符合 Zipf 的缩写定律。结合最大熵原理的最优非奇异编码也可以预测 Zipf 的秩频率分布。此外,我们关于最佳非奇异编码的发现挑战了关于随机类型的普遍信念。事实证明,随机输入实际上是一种最佳编码过程,这与通常认为它与成本削减考虑无关的假设形成鲜明对比。最后,我们更广泛地讨论了优化编码对于构建 Zipfian 定律以及其他语言定律的紧凑理论的意义。

更新日期:2020-07-24
down
wechat
bug