当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Domain Sparsification of Discrete Distributions using Entropic Independence
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-09-14 , DOI: arxiv-2109.06442
Nima Anari, Michał Dereziński

We present a framework for speeding up the time it takes to sample from discrete distributions $\mu$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime $k\ll n$. We show that having estimates of marginals $\mathbb{P}_{S\sim \mu}[i\in S]$, the task of sampling from $\mu$ can be reduced to sampling from distributions $\nu$ supported on size $k$ subsets of a ground set of only $n^{1-\alpha}\cdot \operatorname{poly}(k)$ elements. Here, $1/\alpha\in [1, k]$ is the parameter of entropic independence for $\mu$. Further, the sparsified distributions $\nu$ are obtained by applying a sparse (mostly $0$) external field to $\mu$, an operation that often retains algorithmic tractability of sampling from $\nu$. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of $\mu$, and in return reduce the amortized cost needed to produce many samples from the distribution $\mu$, as is often needed in upstream tasks such as counting and inference. For a wide range of distributions where $\alpha=\Omega(1)$, our result reduces the domain size, and as a corollary, the cost-per-sample, by a $\operatorname{poly}(n)$ factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi\'nski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to $\alpha=1$). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over $O(1/k)$ relative error established in prior work.

中文翻译:

使用熵独立的离散分布的域稀疏化

我们提出了一个框架,用于加快从 $n$ 元素的基本集合的大小为 $k$ 的子集上定义的离散分布 $\mu$ 中采样所需的时间,在 $k\ll n$ 制度中。我们表明,有了对边际 $\mathbb{P}_{S\sim \mu}[i\in S]$ 的估计,从 $\mu$ 采样的任务可以减少到从 $\nu$ 支持的分布中采样仅包含 $n^{1-\alpha}\cdot \operatorname{poly}(k)$ 元素的地面集的大小 $k$ 子集。其中,$1/\alpha\in [1, k]$ 是 $\mu$ 的熵独立参数。此外,稀疏分布 $\nu$ 是通过将稀疏(主要为 $0$)外部字段应用于 $\mu$ 获得的,该操作通常保留从 $\nu$ 采样的算法易处理性。我们将这种现象称为域稀疏化,使我们能够支付一次性成本来估计 $\mu$ 的边际,作为回报,减少从分布 $\mu$ 产生许多样本所需的摊销成本,这在诸如计数和推理之类的上游任务中经常需要。对于 $\alpha=\Omega(1)$ 的广泛分布,我们的结果减少了域大小,并且作为推论,每个样本的成本通过 $\operatorname{poly}(n)$ 因子. 例子包括单体-二聚体系统中的单体、非对称行列式点过程和分配约束强瑞利测量。我们的工作显着扩展了 Anari 和 Derezi\'nski 先前工作的范围,他们使用对数凹生成多项式(对应于 $\alpha=1$)获得分布的域稀疏化。作为我们新分析技术的必然结果,即使在对数凹多项式的情况下,我们也对边际估计的准确性提出了不太严格的要求;粗略地说,我们表明常数因子近似足以进行域稀疏化,改善了先前工作中建立的 $O(1/k)$ 相对误差。
更新日期:2021-09-15
down
wechat
bug