If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2020-06-28 , DOI: 10.1016/j.patrec.2020.06.025
Arnu Pretorius , Elan van Biljon , Benjamin van Niekerk , Ryan Eloff , Matthew Reynard , Steve James , Benjamin Rosman , Herman Kamper , Steve Kroon

Recent work in signal propagation theory has shown that dropout limits the depth to which information can propagate through a neural network. In this paper, we investigate the effect of initialisation on training speed and generalisation for ReLU networks within this depth limit. We ask the following research question: given that critical initialisation is crucial for training at large depth, if dropout limits the depth at which networks are trainable, does initialising critically still matter? We conduct a large-scale controlled experiment, and perform a statistical analysis of over 12 000 trained networks. We find that (1) trainable networks show no statistically significant difference in performance over a wide range of non-critical initialisations; (2) for initialisations that show a statistically significant difference, the net effect on performance is small; (3) only extreme initialisations (very small or very large) perform worse than criticality. These findings also apply to standard ReLU networks of moderate depth as a special case of zero dropout. Our results therefore suggest that, in the shallow-to-moderate depth setting, critical initialisation provides zero performance gains when compared to off-critical initialisations and that searching for off-critical initialisations that might improve training speed or generalisation, is likely to be a fruitless endeavour.

中文翻译：

如果辍学限制了可训练的深度，关键的初始化仍然重要吗？ReLU网络的大规模统计分析

信号传播理论的最新研究表明，信号丢失限制了信息可以通过神经网络传播的深度。在本文中，我们研究了初始化在此深度范围内对ReLU网络的训练速度和泛化的影响。我们提出以下研究问题：鉴于关键初始化对于大深度训练至关重要，如果辍学限制了网络可训练的深度，那么关键初始化仍然重要吗？我们进行了大规模的受控实验，并对超过12 000个受过训练的网络进行了统计分析。我们发现（1）在广泛的非关键初始化过程中，可训练网络的性能没有统计学上的显着差异；（2）对于具有统计显着性差异的初始化，对性能的净影响很小；（3）仅极端初始化（非常小或非常大）的性能要比临界差。这些发现还适用于中等深度的标准ReLU网络，作为零漏失的特殊情况。因此，我们的结果表明，在从浅到中的深度设置中，与非关键初始化相比，关键初始化可提供零性能增益，并且寻找可能会提高训练速度或泛化能力的非关键初始化可能是徒劳的努力。

更新日期：2020-07-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11