Phases of learning dynamics in artificial neural networks in the absence or presence of mislabeled data,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Phases of learning dynamics in artificial neural networks in the absence or presence of mislabeled data
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2021-07-13 , DOI: 10.1088/2632-2153/abf5b9
Yu Feng _{1,

2} , Yuhai Tu ₁

Affiliation

Despite the tremendous success of deep neural networks in machine learning, the underlying reason for their superior learning capability remains unclear. Here, we present a framework based on statistical physics to study the dynamics of stochastic gradient descent (SGD), which drives learning in neural networks. Using the minibatch gradient ensemble, we construct order parameters to characterize the dynamics of weight updates in SGD. In the case without mislabeled data, we find that the SGD learning dynamics transitions from a fast learning phase to a slow exploration phase, which is associated with large changes in the order parameters that characterize the alignment of SGD gradients and their mean amplitude. In a more complex case, with randomly mislabeled samples, the SGD learning dynamics falls into four distinct phases. First, the system finds solutions for the correctly labeled samples in phase I; it then wanders around these solutions in phase II until it finds a direction that enables it to learn the mislabeled samples during phase III, after which, it finds solutions that satisfy all training samples during phase IV. Correspondingly, the test error decreases during phase I and remains low during phase II; however, it increases during phase III and reaches a high plateau during phase IV. The transitions between different phases can be understood by examining changes in the order parameters that characterize the alignment of the mean gradients for the two datasets (correctly and incorrectly labeled samples) and their (relative) strengths during learning. We find that individual sample losses for the two datasets are separated the most during phase II, leading to a data cleansing process that eliminates mislabeled samples and improves generalization. Overall, we believe that an approach based on statistical physics and stochastic dynamic systems theory provides a promising framework for describing and understanding learning dynamics in neural networks, which may also lead to more efficient learning algorithms.

中文翻译：

在不存在或存在错误标记数据的情况下，人工神经网络中的学习动态阶段

尽管深度神经网络在机器学习中取得了巨大成功，但其卓越学习能力的根本原因仍不清楚。在这里，我们提出了一个基于统计物理学的框架来研究随机梯度下降 (SGD) 的动力学，它推动了神经网络的学习。使用小批量梯度集成，我们构建阶次参数来表征 SGD 中权重更新的动态。在没有错误标记数据的情况下，我们发现 SGD 学习动态从快速学习阶段过渡到缓慢探索阶段，这与表征 SGD 梯度对齐及其平均幅度的阶数参数的巨大变化有关。在更复杂的情况下，对于随机错误标记的样本，SGD 学习动态分为四个不同的阶段。第一的，系统为阶段 I 中正确标记的样本找到解；然后它在阶段 II 中围绕这些解决方案徘徊，直到它找到一个方向，使其能够在阶段 III 中学习错误标记的样本，然后在阶段 IV 中找到满足所有训练样本的解决方案。相应地，测试误差在阶段 I 期间减小，在阶段 II 期间保持较低；然而，它在第三阶段增加并在第四阶段达到高水平。通过检查表征两个数据集（正确和错误标记的样本）的平均梯度对齐的顺序参数的变化及其在学习过程中的（相对）强度，可以理解不同阶段之间的转变。我们发现两个数据集的单个样本损失在第二阶段分离得最多，导致数据清理过程，消除错误标记的样本并提高泛化能力。总的来说，我们相信基于统计物理学和随机动态系统理论的方法为描述和理解神经网络中的学习动态提供了一个有前途的框架，这也可能导致更有效的学习算法。

更新日期：2021-07-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文