当前位置: X-MOL 学术arXiv.cs.NA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
arXiv - CS - Numerical Analysis Pub Date : 2021-02-23 , DOI: arxiv-2102.11840
Arnulf Jentzen, Timo Kröger

In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches reach their limits. However, it is still unclear why randomly initialized gradient descent optimization algorithms, such as the well-known batch gradient descent, are able to achieve zero training loss in many situations even though the objective function is non-convex and non-smooth. One of the most promising approaches to solving this problem in the field of supervised learning is the analysis of gradient descent optimization in the so-called overparameterized regime. In this article we provide a further contribution to this area of research by considering overparameterized fully-connected rectified artificial neural networks with biases. Specifically, we show that for a fixed number of training data the mean squared error using batch gradient descent optimization applied to such a randomly initialized artificial neural network converges to zero at a linear convergence rate as long as the width of the artificial neural network is large enough, the learning rate is small enough, and the training input data are pairwise linearly independent.

中文翻译:

带有偏差的超参数化人工神经网络训练中梯度下降的收敛速度

近年来,人工神经网络已经发展成为一种强大的工具,可以解决许多经典解决方案无法解决的问题。但是,仍然不清楚为什么随机初始化的梯度下降优化算法(例如众所周知的批梯度下降)在许多情况下都能够实现零训练损失,即使目标函数是非凸且不平滑的。解决监督学习领域中这一问题的最有希望的方法之一是分析所谓的过参数化方案中的梯度下降优化。在本文中,我们通过考虑带有偏倚的超参数化全连接整流人工神经网络,为这一研究领域提供了进一步的贡献。具体来说,
更新日期:2021-02-24
down
wechat
bug