当前位置: X-MOL 学术Comm. Pure Appl. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Trainability and Accuracy of Artificial Neural Networks: An Interacting Particle System Approach
Communications on Pure and Applied Mathematics ( IF 3 ) Pub Date : 2022-07-21 , DOI: 10.1002/cpa.22074
Grant Rotskoff 1 , Eric Vanden‐Eijnden 2
Affiliation  

Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high-dimensional functions, but rigorous results about the approximation error of neural networks after training are few. Here we establish conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), and quantifying the scaling of its error with the size of the network. This is done by reinterpreting SGD as the evolution of a particle system with interactions governed by a potential related to the objective or “loss” function used to train the network. We show that, when the number n of units is large, the empirical distribution of the particles descends on a convex landscape towards the global minimum at a rate independent of n, with a resulting approximation error that universally scales as O(n−1). These properties are established in the form of a law of large numbers and a central limit theorem for the empirical distribution. Our analysis also quantifies the scale and nature of the noise introduced by SGD and provides guidelines for the step size and batch size to use when training a neural network. We illustrate our findings on examples in which we train neural networks to learn the energy function of the continuous 3-spin model on the sphere. The approximation error scales as our analysis predicts in as high a dimension as d = 25. © 2022 Courant Institute of Mathematics and Wiley Periodicals LLC.

中文翻译:

人工神经网络的可训练性和准确性:一种相互作用的粒子系统方法

神经网络是机器学习的核心工具,在图像识别和分类任务中表现出卓越的高保真性能。这些成功证明了准确表示高维函数的能力,但关于训练后神经网络逼近误差的严格结果却很少。在这里,我们为机器学习应用程序中使用的标准优化算法、随机梯度下降 (SGD) 的全局收敛建立条件,并量化其误差随网络大小的缩放。这是通过将 SGD 重新解释为粒子系统的演化来完成的,粒子系统的交互由与用于训练网络的目标或“损失”函数相关的势能控制。我们证明,当数n单位的数量很大,粒子的经验分布在凸面景观上以与n无关的速率下降到全局最小值,由此产生的近似误差普遍为O ( n -1)。这些性质以大数定律和经验分布的中心极限定理的形式建立。我们的分析还量化了 SGD 引入的噪声的规模和性质,并为训练神经网络时使用的步长和批量大小提供了指导。我们通过示例说明我们的发现,在这些示例中,我们训练神经网络来学习球体上连续 3-spin 模型的能量函数。正如我们的分析预测的那样,近似误差在d = 25 的高维度上进行缩放。 © 2022 Courant Institute of Mathematics and Wiley Periodicals LLC。
更新日期:2022-07-22
down
wechat
bug