当前位置:
X-MOL 学术
›
arXiv.cs.NA
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
arXiv - CS - Numerical Analysis Pub Date : 2021-02-23 , DOI: arxiv-2102.11840 Arnulf Jentzen, Timo Kröger
arXiv - CS - Numerical Analysis Pub Date : 2021-02-23 , DOI: arxiv-2102.11840 Arnulf Jentzen, Timo Kröger
In recent years, artificial neural networks have developed into a powerful
tool for dealing with a multitude of problems for which classical solution
approaches reach their limits. However, it is still unclear why randomly
initialized gradient descent optimization algorithms, such as the well-known
batch gradient descent, are able to achieve zero training loss in many
situations even though the objective function is non-convex and non-smooth. One
of the most promising approaches to solving this problem in the field of
supervised learning is the analysis of gradient descent optimization in the
so-called overparameterized regime. In this article we provide a further
contribution to this area of research by considering overparameterized
fully-connected rectified artificial neural networks with biases. Specifically,
we show that for a fixed number of training data the mean squared error using
batch gradient descent optimization applied to such a randomly initialized
artificial neural network converges to zero at a linear convergence rate as
long as the width of the artificial neural network is large enough, the
learning rate is small enough, and the training input data are pairwise
linearly independent.
中文翻译:
带有偏差的超参数化人工神经网络训练中梯度下降的收敛速度
近年来,人工神经网络已经发展成为一种强大的工具,可以解决许多经典解决方案无法解决的问题。但是,仍然不清楚为什么随机初始化的梯度下降优化算法(例如众所周知的批梯度下降)在许多情况下都能够实现零训练损失,即使目标函数是非凸且不平滑的。解决监督学习领域中这一问题的最有希望的方法之一是分析所谓的过参数化方案中的梯度下降优化。在本文中,我们通过考虑带有偏倚的超参数化全连接整流人工神经网络,为这一研究领域提供了进一步的贡献。具体来说,
更新日期:2021-02-24
中文翻译:
带有偏差的超参数化人工神经网络训练中梯度下降的收敛速度
近年来,人工神经网络已经发展成为一种强大的工具,可以解决许多经典解决方案无法解决的问题。但是,仍然不清楚为什么随机初始化的梯度下降优化算法(例如众所周知的批梯度下降)在许多情况下都能够实现零训练损失,即使目标函数是非凸且不平滑的。解决监督学习领域中这一问题的最有希望的方法之一是分析所谓的过参数化方案中的梯度下降优化。在本文中,我们通过考虑带有偏倚的超参数化全连接整流人工神经网络,为这一研究领域提供了进一步的贡献。具体来说,