The Optimal Approximation Factor in Density Estimation,arXiv - CS - Computational Complexity

当前位置： X-MOL 学术 › arXiv.cs.CC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Optimal Approximation Factor in Density Estimation
arXiv - CS - Computational Complexity Pub Date : 2019-02-10 , DOI: arxiv-1902.05876
Olivier Bousquet and Daniel Kane and Shay Moran

Consider the following problem: given two arbitrary densities $q_1,q_2$ and a sample-access to an unknown target density $p$, find which of the $q_i$'s is closer to $p$ in total variation. A remarkable result due to Yatracos shows that this problem is tractable in the following sense: there exists an algorithm that uses $O(\epsilon^{-2})$ samples from $p$ and outputs~$q_i$ such that with high probability, $TV(q_i,p) \leq 3\cdot\mathsf{opt} + \epsilon$, where $\mathsf{opt}= \min\{TV(q_1,p),TV(q_2,p)\}$. Moreover, this result extends to any finite class of densities $\mathcal{Q}$: there exists an algorithm that outputs the best density in $\mathcal{Q}$ up to a multiplicative approximation factor of 3. We complement and extend this result by showing that: (i) the factor 3 can not be improved if one restricts the algorithm to output a density from $\mathcal{Q}$, and (ii) if one allows the algorithm to output arbitrary densities (e.g.\ a mixture of densities from $\mathcal{Q}$), then the approximation factor can be reduced to 2, which is optimal. In particular this demonstrates an advantage of improper learning over proper in this setup. We develop two approaches to achieve the optimal approximation factor of 2: an adaptive one and a static one. Both approaches are based on a geometric point of view of the problem and rely on estimating surrogate metrics to the total variation. Our sample complexity bounds exploit techniques from {\it Adaptive Data Analysis}.

中文翻译：

密度估计中的最优逼近因子

考虑以下问题：给定两个任意密度 $q_1,q_2$ 和对未知目标密度 $p$ 的样本访问，找出 $q_i$ 中的哪一个在总变化中更接近 $p$。Yatracos 的一个显着结果表明，这个问题在以下意义上是可以解决的：存在一个算法，它使用来自 $p$ 的 $O(\epsilon^{-2})$ 样本并输出~$q_i$ 使得具有高概率，$TV(q_i,p) \leq 3\cdot\mathsf{opt} + \epsilon$，其中 $\mathsf{opt}= \min\{TV(q_1,p),TV(q_2,p)\ }$。此外，这个结果扩展到任何有限类别的密度 $\mathcal{Q}$：存在一种算法可以输出 $\mathcal{Q}$ 中的最佳密度，乘法近似因子为 3。我们补充并扩展了这个结果表明：(i) 如果限制算法从 $\mathcal{Q}$ 输出密度，则因子 3 不能改进，并且 (ii) 如果允许算法输出任意密度（例如\来自 $\mathcal{Q}$ 的密度的混合） \mathcal{Q}$)，那么近似因子可以减少到 2，这是最优的。特别是，这证明了在此设置中不正确学习比正确学习的优势。我们开发了两种方法来实现最佳逼近因子 2：自适应方法和静态方法。这两种方法都基于问题的几何观点，并依赖于估计总变化的替代指标。我们的样本复杂度界限利用了 {\it Adaptive Data Analysis} 中的技术。那么近似因子可以减少到2，这是最优的。特别是，这证明了在此设置中不正确学习比正确学习的优势。我们开发了两种方法来实现最佳逼近因子 2：自适应方法和静态方法。这两种方法都基于问题的几何观点，并依赖于估计总变化的替代指标。我们的样本复杂度界限利用了 {\it Adaptive Data Analysis} 中的技术。那么近似因子可以减少到2，这是最优的。特别是，这证明了在此设置中不正确学习比正确学习的优势。我们开发了两种方法来实现最佳逼近因子 2：自适应方法和静态方法。这两种方法都基于问题的几何观点，并依赖于估计总变化的替代指标。我们的样本复杂度界限利用了 {\it Adaptive Data Analysis} 中的技术。这两种方法都基于问题的几何观点，并依赖于估计总变化的替代指标。我们的样本复杂度界限利用了 {\it Adaptive Data Analysis} 中的技术。这两种方法都基于问题的几何观点，并依赖于估计总变化的替代指标。我们的样本复杂度界限利用了 {\it Adaptive Data Analysis} 中的技术。

更新日期：2020-04-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文