How isotropic kernels perform on simple invariants,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

How isotropic kernels perform on simple invariants
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2021-03-02 , DOI: 10.1088/2632-2153/abd485
Jonas Paccolat , Stefano Spigler , Matthieu Wyart

We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_\parallel$ variables, fewer than the input dimension d. We compute the expected test error ϵ that follows $\epsilon\sim p^{-\beta}$ where p is the size of the training set. We find that β ∼ 1/d independently of $d_\parallel$ , supporting previous findings that the presence of invariants does not resolve the curse of dimensionality for kernel regression. (ii) Next we consider support-vector binary classification and introduce the stripe model, where the data label depends on a single coordinate $y(\underline x) = y(x_1)$ , corresponding to parallel decision boundaries separating labels of different signs, and consider that there is no margin at these interfaces. We argue and confirm numerically that, for large bandwidth, $\beta = \frac{d-1+\xi}{3d-3+\xi}$ , where ξ ∈ (0, 2) is the exponent characterizing the singularity of the kernel at the origin. This estimation improves classical bounds obtainable from Rademacher complexity. In this setting there is no curse of dimensionality since $\beta\rightarrow 1/3$ as $d\rightarrow\infty$ . (iii) We confirm these findings for the spherical model, for which $y(\underline x) = y(||\underline x||)$ . (iv) In the stripe model, we show that, if the data are compressed along their invariants by some factor λ (an operation believed to take place in deep networks), the test error is reduced by a factor $\lambda^{-\frac{2(d-1)}{3d-3+\xi}}$ .

中文翻译：

各向同性内核如何在简单不变式上执行

我们研究了在各种情况下各向同性核方法的训练曲线如何取决于要学习的任务的对称性。（i）我们考虑一个回归任务，其中目标函数是仅取决于 $d_ \ parallel$ 变量的高斯随机字段，小于输入维d。我们计算预期的测试误差ϵ， $\ epsilon \ sim p ^ {-\ beta}$ 其中p是训练集的大小。我们发现β〜1 / d独立于 $d_ \ parallel$ ，支持先前的发现，即不变性的存在并不能解决维数回归的核诅咒。（ii）接下来，我们考虑支持向量二进制分类并介绍条带模型，其中数据标签取决于单个坐标 $y（\下划线x）= y（x_1）$ ，该坐标对应于分隔不同符号的标签的平行决策边界，并考虑到这些接口之间没有空白。我们通过数值论证并证实，对于大带宽，， $\ beta = \ frac {d-1 + \ xi} {3d-3 + \ xi}$ 其中ξ∈ （0，2）是表征核在原点处的奇异性的指数。该估计改进了可从Rademacher复杂度获得的经典界限。在这种情况下，因为 $\ beta \ rightarrow 1/3$ as，所以没有维数的诅咒 $d \ rightarrow \ infty$ 。（iii）对于球形模型，我们确认了这些发现 $y（\下划线x）= y（|| \下划线x ||）$ 。（iv）在条带模型中，我们表明，如果数据沿其不变量压缩某个因子λ（据信发生在深层网络中的操作），测试错误减少了一个因子 $\ lambda ^ {-\ frac {2（d-1）} {3d-3 + \ xi}}$ 。

更新日期：2021-03-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文