Optimal Testing of Discrete Distributions with High Probability,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimal Testing of Discrete Distributions with High Probability
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-09-14 , DOI: arxiv-2009.06540
Ilias Diakonikolas and Themis Gouleakis and Daniel M. Kane and John Peebles and Eric Price

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and parameters $0< \epsilon, \delta <1$, we want to distinguish {\em with probability at least $1-\delta$} whether these distributions satisfy $\mathcal{P}$ or are $\epsilon$-far from $\mathcal{P}$ in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to $\delta = \Omega(1)$), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property $\mathcal{P}$, can we {\em characterize} the sample complexity of testing $\mathcal{P}$ as a function of all relevant problem parameters, including the error probability $\delta$? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples.

中文翻译：

高概率离散分布的优化测试

我们研究了测试离散分布的问题，重点是高概率区域。具体来说，给定来自一个或多个离散分布的样本、属性 $\mathcal{P}$ 和参数 $0< \epsilon, \delta <1$，我们希望区分 {\em 的概率至少为 $1-\delta$ } 这些分布是否满足 $\mathcal{P}$ 或 $\epsilon$-在总变化距离上远离 $\mathcal{P}$。分布测试中的大多数先前工作研究了恒定置信情况（对应于 $\delta = \Omega(1)$），并为一系列属性提供了样本最佳测试器。虽然人们总是可以通过黑盒放大来提高任何此类测试器的置信概率，但这种通用的提升方法通常会导致次优样本界限。在这里，我们研究以下广泛的问题：对于给定的属性 $\mathcal{P}$，我们能否将测试 $\mathcal{P}$ 的样本复杂度表征为所有相关问题参数的函数，包括错误概率 $\delta$？在此工作之前，均匀性测试是唯一的统计任务，其样本复杂性已在此设置中进行了表征。作为我们的主要结果，我们提供了第一个算法用于接近性和独立性测试，这些算法是样本最优的，在常数因子内，作为所有相关参数的函数。我们还展示了这些问题样本复杂性的匹配信息理论下界。我们的技术自然会扩展到为相关问题提供最佳测试人员。为了说明我们方法的通用性，我们给出了用于测试分布集合和测试大小不等样本的接近度的最佳算法。

更新日期：2020-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文