当前位置: X-MOL 学术Phys. Rep. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Landscape and training regimes in deep learning
Physics Reports ( IF 23.9 ) Pub Date : 2021-04-16 , DOI: 10.1016/j.physrep.2021.04.001
Mario Geiger , Leonardo Petrini , Matthieu Wyart

Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension – a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i, ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the (h,α) plane where h controls the number of parameters and α the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for two common image classification datasets. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrized phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically. Practical implications are also discussed, including the benefit of averaging nets with distinct initial weights, or the choice of parameters (h,α) optimizing performance.



中文翻译:

深度学习中的景观和训练制度

深度学习算法负责各种任务的技术革命,包括图像识别或围棋。然而,人们不明白它们为什么起作用。最终,他们设法对高维数据进行分类——由于高维空间的几何形状和相关的维数灾难,这一壮举通常是不可能. 了解什么样的结构、对称性或不变性使图像等数据可学习是一项基本挑战。其他难题包括(i)学习对应于最大限度地减少高维损失,这通常不是凸的,很可能会陷入糟糕的最小值。(ii) 深度学习预测能力随着拟合参数的数量而增加,即使在数据完美拟合的情况下也是如此。在这份手稿中,我们回顾了最近的结果,阐明了 (i, ii) 以及它们对(仍然无法解释的)维度悖论的诅咒提供的观点。我们的理论讨论基于(H,α) 飞机在哪里 H 控制参数的数量和 α初始化时网络输出的规模,并为两个常见的图像分类数据集在该平面上提供新的系统性能度量。我们认为可以将不同的学习机制组织成一个相图。一连串临界点将参数化不足的阶段与参数化过度的阶段清晰地划分开来。在过度参数化的网络中,学习可以在被平滑交叉分隔的两个机制中运行。在大型初始化时,它对应于内核方法,而对于小型初始化,可以学习特征以及数据中的不变量。我们回顾了这些不同阶段的特性、将它们分开的过渡以及一些未解决的问题。我们的处理强调与物理系统的类比,标度论证和数值观测的发展,以凭经验对这些结果进行定量测试。还讨论了实际意义,包括平均具有不同初始权重的网络的好处,或参数的选择(H,α) 优化性能。

更新日期:2021-06-11
down
wechat
bug