Anomalous diffusion dynamics of learning in deep neural networks,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Anomalous diffusion dynamics of learning in deep neural networks
arXiv - CS - Machine Learning Pub Date : 2020-09-22 , DOI: arxiv-2009.10588
Guozhang Chen, Cheng Kevin Qu, Pulin Gong

Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-convex loss function, typically by a stochastic gradient descent (SGD) method. This learning process can effectively find good wide minima without being trapped in poor local ones. We present a novel account of how such effective deep learning emerges through the interactions of the SGD and the geometrical structure of the loss landscape. Rather than being a normal diffusion process (i.e. Brownian motion) as often assumed, we find that the SGD exhibits rich, complex dynamics when navigating through the loss landscape; initially, the SGD exhibits anomalous superdiffusion, which attenuates gradually and changes to subdiffusion at long times when the solution is reached. Such learning dynamics happen ubiquitously in different DNNs such as ResNet and VGG-like networks and are insensitive to batch size and learning rate. The anomalous superdiffusion process during the initial learning phase indicates that the motion of SGD along the loss landscape possesses intermittent, big jumps; this non-equilibrium property enables the SGD to escape from sharp local minima. By adapting the methods developed for studying energy landscapes in complex physical systems, we find that such superdiffusive learning dynamics are due to the interactions of the SGD and the fractal-like structure of the loss landscape. We further develop a simple model to demonstrate the mechanistic role of the fractal loss landscape in enabling the SGD to effectively find global minima. Our results thus reveal the effectiveness of deep learning from a novel perspective and have implications for designing efficient deep neural networks.

中文翻译：

深度神经网络中学习的异常扩散动力学

深度神经网络 (DNN) 中的学习是通过最小化高度非凸的损失函数来实现的，通常采用随机梯度下降 (SGD) 方法。这个学习过程可以有效地找到好的宽最小值，而不会被困在较差的局部最小值中。我们提出了一种新颖的解释，说明这种有效的深度学习如何通过 SGD 和损失景观的几何结构的相互作用而出现。我们发现 SGD 在穿越损失景观时表现出丰富、复杂的动态，而不是通常假设的正常扩散过程（即布朗运动）；最初，SGD 表现出异常的超扩散，当达到解决方案时，这种超扩散会逐渐减弱并长时间变为亚扩散。这种学习动态在不同的 DNN 中无处不在，例如 ResNet 和 VGG 类网络，并且对批量大小和学习率不敏感。初始学习阶段的异常超扩散过程表明 SGD 沿着损失景观的运动具有间歇性的大跳跃；这种非平衡特性使 SGD 能够摆脱尖锐的局部最小值。通过采用为研究复杂物理系统中的能量景观而开发的方法，我们发现这种超扩散学习动态是由于 SGD 和损失景观的分形结构的相互作用。我们进一步开发了一个简单的模型来证明分形损失景观在使 SGD 能够有效地找到全局最小值方面的机械作用。

更新日期：2020-09-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文