Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes
Journal of the ACM ( IF 2.3 ) Pub Date : 2020-10-06 , DOI: 10.1145/3417994 Hassan Ashtiani 1 , Shai Ben-David 2 , Nicholas J. A. Harvey 3 , Christopher Liaw 3 , Abbas Mehrabian 4 , Yaniv Plan 5
Journal of the ACM ( IF 2.3 ) Pub Date : 2020-10-06 , DOI: 10.1145/3417994 Hassan Ashtiani 1 , Shai Ben-David 2 , Nicholas J. A. Harvey 3 , Christopher Liaw 3 , Abbas Mehrabian 4 , Yaniv Plan 5
Affiliation
We introduce a novel technique for distribution learning based on a notion of sample compression . Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. As an application of this technique, we prove that ˜Θ( kd 2 /ε 2 ) samples are necessary and sufficient for learning a mixture of k Gaussians in R d , up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Õ( kd /ε 2 ) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning (or robust estimation) setting, in which the target distribution is only approximately a mixture of Gaussians. Our main upper bound is proven by showing that the class of Gaussians in R d admits a small compression scheme.
中文翻译:
通过压缩方案对高斯混合进行鲁棒学习的近似最优样本复杂度界限
我们介绍了一种基于以下概念的分布式学习新技术样本压缩 . 任何允许这种压缩方案的分布类别都可以用很少的样本来学习。此外,如果一类分布有这样的压缩方案,那么产品 和混合物 这些分布。作为该技术的一个应用,我们证明了 ∼Θ(KD 2 /ε2 ) 样本对于学习混合ķ R中的高斯 d ,直至总变化距离的误差 ε。这改进了该问题的已知上限和下限。对于轴对齐的高斯混合,我们证明了 Õ(KD /ε2 ) 样本就足够了,匹配已知的下限。此外,这些结果适用于不可知的学习(或稳健估计)设置,其中目标分布仅近似为高斯的混合。我们的主要上界通过证明 R 中的高斯类 d 承认一个小的压缩方案。
更新日期:2020-10-06
中文翻译:
通过压缩方案对高斯混合进行鲁棒学习的近似最优样本复杂度界限
我们介绍了一种基于以下概念的分布式学习新技术