当前位置: X-MOL 学术J. Stat. Mech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Asymptotic learning curves of kernel methods: empirical data versus teacher–student paradigm
Journal of Statistical Mechanics: Theory and Experiment ( IF 2.4 ) Pub Date : 2020-12-22 , DOI: 10.1088/1742-5468/abc61d
Stefano Spigler , Mario Geiger , Matthieu Wyart

How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-\beta}$ where $n$ is the number of training examples and $\beta$ an exponent that depends on both data and algorithm. In this work we measure $\beta$ when applying kernel methods to real datasets. For MNIST we find $\beta\approx 0.4$ and for CIFAR10 $\beta\approx 0.1$. Remarkably, $\beta$ is the same for regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we introduce the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption --- namely that the data are sampled from a regular lattice --- we derive analytically $\beta$ for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, $\beta$ depends only on the training data and their dimension. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, our results quantify how smooth Gaussian data should be to avoid the curse of dimensionality, and indicate that for kernel learning the relevant dimension of the data should be defined in terms of how the distance between nearest data points depends on $n$. With this definition one obtains reasonable effective smoothness estimates for MNIST and CIFAR10.

中文翻译:

核方法的渐近学习曲线:经验数据与师生范式

学习一个有监督的任务需要多少训练数据?经常观察到泛化误差随着 $n^{-\beta}$ 减少,其中 $n$ 是训练示例的数量,$\beta$ 是取决于数据和算法的指数。在这项工作中,我们在将核方法应用于真实数据集时测量 $\beta$。对于 MNIST,我们发现 $\beta\approx 0.4$ 和 CIFAR10 $\beta\approx 0.1$。值得注意的是,$\beta$ 对于回归和分类任务以及高斯核或拉普拉斯核是相同的。为了使可以独立于所使用的特定内核的非平凡指数的存在合理化,我们引入了内核的教师-学生框架。在该方案中,教师根据高斯随机场生成数据,学生通过核回归学习。有了一个简化的假设——即数据是从规则格子中采样的——我们使用克里金文献的先前结果分析推导了平移不变核的 $\beta$。假设 Student 对高频不太敏感,$\beta$ 仅取决于训练数据及其维度。我们从数值上确认,当训练点在超球面上随机采样时,这些预测成立。总的来说,我们的结果量化了高斯数据应该有多平滑以避免维度灾难,并表明对于内核学习,数据的相关维度应该根据最近数据点之间的距离如何取决于 $n$ 来定义。有了这个定义,就可以为 MNIST 和 CIFAR10 获得合理的有效平滑度估计。
更新日期:2020-12-22
down
wechat
bug