当前位置: X-MOL 学术arXiv.cs.DM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal Approximate Sampling from Discrete Probability Distributions
arXiv - CS - Discrete Mathematics Pub Date : 2020-01-13 , DOI: arxiv-2001.04555
Feras A. Saad, Cameron E. Freer, Martin C. Rinard, Vikash K. Mansinghka

This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is "entropy-optimal") and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.

中文翻译:

离散概率分布的最优近似采样

本文解决了随机变量生成中的一个基本问题:给定对发出独立公平比特流的随机源的访问权,从离散概率分布 $(p_1, \dots, p_n)$,​​其中采样算法的输出分布 $(\hat{p}_1, \dots, \hat{p}_n)$ 的概率必须使用至多 $k$ 位精度指定?我们提出了一个用于表述这个问题的理论框架,并提供了新的技术来寻找在统计上(在采样精度的意义上)和信息理论上(在熵消耗的意义上)都是最优的采样算法。我们利用这些结果来构建一个系统,该系统对于广泛的统计准确性测量系列,提供一种采样算法,其预期熵使用在诱导相同分布(即“熵最优”)的那些中最小,并且其输出分布 $(\hat{p}_1, \dots, \hat{p}_n) $ 是在指定的 $k$ 位精度内运行的所有熵最优采样算法中最接近目标分布 $(p_1, \dots, p_n)$ 的近似值。这种最优近似采样器也是比任何(可能是熵次优的)采样器更接近的近似,它以指定的精度消耗有限数量的熵,该类包括反采样的浮点实现和在许多软件库中找到的相关方法。我们评估准确性、熵消耗、精度要求,
更新日期:2020-03-10
down
wechat
bug