当前位置:
X-MOL 学术
›
arXiv.cs.NE
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-01-16 , DOI: arxiv-2001.05992 Wei Hu, Lechao Xiao, Jeffrey Pennington
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-01-16 , DOI: arxiv-2001.05992 Wei Hu, Lechao Xiao, Jeffrey Pennington
The selection of initial parameter values for gradient-based optimization of
deep neural networks is one of the most impactful hyperparameter choices in
deep learning systems, affecting both convergence times and model performance.
Yet despite significant empirical and theoretical analysis, relatively little
has been proved about the concrete effects of different initialization schemes.
In this work, we analyze the effect of initialization in deep linear networks,
and provide for the first time a rigorous proof that drawing the initial
weights from the orthogonal group speeds up convergence relative to the
standard Gaussian initialization with iid weights. We show that for deep
networks, the width needed for efficient convergence to a global minimum with
orthogonal initializations is independent of the depth, whereas the width
needed for efficient convergence with Gaussian initializations scales linearly
in the depth. Our results demonstrate how the benefits of a good initialization
can persist throughout learning, suggesting an explanation for the recent
empirical successes found by initializing very deep non-linear networks
according to the principle of dynamical isometry.
中文翻译:
正交初始化在优化深度线性网络中的可证明优势
为深度神经网络的基于梯度的优化选择初始参数值是深度学习系统中最具影响力的超参数选择之一,影响收敛时间和模型性能。然而,尽管进行了大量的实证和理论分析,但关于不同初始化方案的具体效果的证明相对较少。在这项工作中,我们分析了深度线性网络中初始化的影响,并首次提供了一个严格的证明,即从正交组中提取初始权重相对于具有 iid 权重的标准高斯初始化加速了收敛。我们表明,对于深度网络,有效收敛到具有正交初始化的全局最小值所需的宽度与深度无关,而高斯初始化有效收敛所需的宽度在深度上线性缩放。我们的结果证明了良好初始化的好处如何在整个学习过程中持续存在,这为最近通过根据动态等距原理初始化非常深的非线性网络所发现的经验成功提供了解释。
更新日期:2020-01-17
中文翻译:
正交初始化在优化深度线性网络中的可证明优势
为深度神经网络的基于梯度的优化选择初始参数值是深度学习系统中最具影响力的超参数选择之一,影响收敛时间和模型性能。然而,尽管进行了大量的实证和理论分析,但关于不同初始化方案的具体效果的证明相对较少。在这项工作中,我们分析了深度线性网络中初始化的影响,并首次提供了一个严格的证明,即从正交组中提取初始权重相对于具有 iid 权重的标准高斯初始化加速了收敛。我们表明,对于深度网络,有效收敛到具有正交初始化的全局最小值所需的宽度与深度无关,而高斯初始化有效收敛所需的宽度在深度上线性缩放。我们的结果证明了良好初始化的好处如何在整个学习过程中持续存在,这为最近通过根据动态等距原理初始化非常深的非线性网络所发现的经验成功提供了解释。