Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes,Journal of the ACM

当前位置： X-MOL 学术 › J. ACM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes
Journal of the ACM ( IF 2.3 ) Pub Date : 2020-10-06 , DOI: 10.1145/3417994
Hassan Ashtiani ₁ , Shai Ben-David ₂ , Nicholas J. A. Harvey ₃ , Christopher Liaw ₃ , Abbas Mehrabian ₄ , Yaniv Plan ₅

Affiliation

We introduce a novel technique for distribution learning based on a notion of sample compression . Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. As an application of this technique, we prove that ˜Θ( kd 2 /ε 2 ) samples are necessary and sufficient for learning a mixture of k Gaussians in R d , up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Õ( kd /ε 2 ) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning (or robust estimation) setting, in which the target distribution is only approximately a mixture of Gaussians. Our main upper bound is proven by showing that the class of Gaussians in R d admits a small compression scheme.

中文翻译：

通过压缩方案对高斯混合进行鲁棒学习的近似最优样本复杂度界限

我们介绍了一种基于以下概念的分布式学习新技术样本压缩. 任何允许这种压缩方案的分布类别都可以用很少的样本来学习。此外，如果一类分布有这样的压缩方案，那么产品和混合物这些分布。作为该技术的一个应用，我们证明了 ∼Θ(KD 2/ε2) 样本对于学习混合ķR中的高斯 d ，直至总变化距离的误差 ε。这改进了该问题的已知上限和下限。对于轴对齐的高斯混合，我们证明了 Õ(KD/ε2) 样本就足够了，匹配已知的下限。此外，这些结果适用于不可知的学习（或稳健估计）设置，其中目标分布仅近似为高斯的混合。我们的主要上界通过证明 R 中的高斯类 d 承认一个小的压缩方案。

更新日期：2020-10-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11