当前位置: X-MOL 学术J. Stat. Mech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Geometric compression of invariant manifolds in neural networks
Journal of Statistical Mechanics: Theory and Experiment ( IF 2.4 ) Pub Date : 2021-04-26 , DOI: 10.1088/1742-5468/abf1f3
Jonas Paccolat , Leonardo Petrini , Mario Geiger , Kevin Tyloo , Matthieu Wyart

We study how neural networks compress uninformative input space in models where data lie in d dimensions, but the labels of which only vary within a linear manifold of dimension d < d. We show that for a one-hidden-layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolves to become nearly insensitive to the d = dd uninformative directions. These are effectively compressed by a factor $\lambda \sim \sqrt{p}$, where p is the size of the training set. We quantify the benefit of such a compression on the test error ϵ. For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that ϵp β , with β Lazy = d/(3d − 2). Compression improves the learning curves so that β Feature = (2d − 1)/(3d − 2) if d = 1 and β Feature = (d + d /2)/(3d − 2) if d > 1. We test these predictions for a stripe model where boundaries are parallel interfaces (d = 1) as well as for a cylindrical boundary (d = 2). Next, we show that compression shapes the neural tangent kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden-layer fully connected network trained on the stripe model and for a 16-layer convolutional neural network trained on the Modified National Institute of Standards and Technology database (MNIST), for which we also find β Feature > β Lazy. The great similarities found in these two cases support the idea that compression is central to the training of MNIST, and puts forward kernel principal component analysis on the evolving NTK as a useful diagnostic of compression in deep networks.



中文翻译:

神经网络中不变流形的几何压缩

我们研究了神经网络如何在数据位于d维的模型中压缩无信息输入空间,但其标签仅在维数d < d的线性流形内变化。我们表明,对于用梯度下降训练的无穷小权重初始化的单隐藏层网络(即在特征学习机制中),第一层权重演变为对d = dd ∥ 无信息方向几乎不敏感。这些被一个因子有效地压缩$\lambda \sim \sqrt{p}$,其中p是训练集的大小。我们量化了这种压缩对测试误差ϵ的好处。对于权重的大初始化(惰性训练机制),不会发生压缩,对于分隔标签的规则边界,我们发现ϵp β ,其中β Lazy = d /(3 d − 2)。压缩改善了学习曲线,使得β Feature = (2 d − 1)/(3 d − 2) if d = 1 and β Feature = ( d + d /2)/(3 d − 2) 如果d > 1。我们针对边界为平行界面 ( d = 1)的条纹模型以及圆柱边界 ( d = 2)。接下来,我们展示了压缩会及时塑造神经切线核 (NTK) 的演化,使其顶部特征向量变得更加丰富,并在标签上显示更大的投影。因此,在训练结束时使用冻结 NTK 的内核学习优于初始 NTK。我们对在条带模型上训练的单隐藏层全连接网络和在修改后的美国国家标准与技术研究所数据库 (MNIST) 上训练的 16 层卷积神经网络都证实了这些预测,我们还发现了β 特征> β 懒惰. 在这两个案例中发现的巨大相似之处支持了压缩是 MNIST 训练的核心的观点,并提出了对不断发展的 NTK 的核主成分分析,作为深度网络中压缩的有用诊断。

更新日期:2021-04-26
down
wechat
bug