当前位置: X-MOL 学术J. Stat. Mech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Wide neural networks of any depth evolve as linear models under gradient descent *
Journal of Statistical Mechanics: Theory and Experiment ( IF 2.4 ) Pub Date : 2020-12-22 , DOI: 10.1088/1742-5468/abc62b
Jaehoon Lee , Lechao Xiao , Samuel S Schoenholz , Yasaman Bahri , Roman Novak , Jascha Sohl-Dickstein , Jeffrey Pennington

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

中文翻译:

任何深度的宽神经网络在梯度下降下演变为线性模型*

深度学习研究的一个长期目标是精确表征训练和泛化。然而,神经网络通常复杂的损失情况使得学习动力学理论难以捉摸。在这项工作中,我们表明,对于宽神经网络,学习动态大大简化,并且在无限宽度限制下,它们由从网络围绕其初始参数的一阶泰勒展开获得的线性模型控制。此外,反映宽贝叶斯神经网络和高斯过程之间的对应关系,具有平方损失的宽神经网络的基于梯度的训练产生从具有特定组合核的高斯过程得出的测试集预测。虽然这些理论结果仅在无限宽度限制中是准确的,尽管如此,我们发现原始网络的预测与线性化版本的预测之间具有极好的经验一致性,即使对于实际规模有限的网络也是如此。这种一致性在不同的架构、优化方法和损失函数中是稳健的。
更新日期:2020-12-22
down
wechat
bug