Distinct flavors of Zipf's law and its maximum likelihood fitting: Rank-size and size-distribution representations,Physical Review E

当前位置： X-MOL 学术 › Phys. Rev. E › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distinct flavors of Zipf's law and its maximum likelihood fitting: Rank-size and size-distribution representations
Physical Review E ( IF 2.2 ) Pub Date : 2020-11-10 , DOI: 10.1103/physreve.102.052113
Álvaro Corral , Isabel Serra , Ramon Ferrer-i-Cancho

In recent years, researchers have realized the difficulties of fitting power-law distributions properly. These difficulties are higher in Zipfian systems, due to the discreteness of the variables and to the existence of two representations for these systems, i.e., two versions depending on the random variable to fit: rank or size. The discreteness implies that a power law in one of the representations is not a power law in the other, and vice versa. We generate synthetic power laws in both representations and apply a state-of-the-art fitting method to each of the two random variables. The method (based on maximum likelihood plus a goodness-of-fit test) does not fit the whole distribution but the tail, understood as the part of a distribution above a cutoff that separates non-power-law behavior from power-law behavior. We find that, no matter which random variable is power-law distributed, using the rank as the random variable is problematic for fitting, in general (although it may work in some limit cases). One of the difficulties comes from recovering the “hidden” true ranks from the empirical ranks. On the contrary, the representation in terms of the distribution of sizes allows one to recover the true exponent (with some small bias when the underlying size distribution is a power law only asymptotically).

中文翻译：

Zipf定律的不同风味及其最大似然拟合：秩大小和大小分布表示

近年来，研究人员已经认识到正确拟合幂律分布的困难。由于变量的离散性以及这些系统存在两种表示形式，即取决于依赖的随机变量的两种形式：秩或大小，在Zipfian系统中，这些困难更高。离散性意味着其中一种表示形式的幂定律不是另一种表示形式的幂定律，反之亦然。我们在这两种表示形式中都生成了合成幂定律，并对两个随机变量中的每一个均应用了最先进的拟合方法。该方法（基于最大似然加拟合优度检验）不适合整个分布，而是适合尾部，尾部被理解为是将非幂律行为与幂律行为分开的截止点上方的分布的一部分。我们发现不管哪个随机变量是幂律分布的，通常都使用秩作为随机变量来拟合（尽管在某些极限情况下可能会起作用）。困难之一来自从经验等级中恢复“隐藏”的真实等级。相反，用大小分布表示可以恢复真实指数（当基本大小分布只是渐近幂函数时，会有一些小的偏差）。

更新日期：2020-11-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11