Activation function design for deep networks: linearity and effective initialisation,Applied and Computational Harmonic Analysis

当前位置： X-MOL 学术 › Appl. Comput. Harmon. Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Activation function design for deep networks: linearity and effective initialisation
Applied and Computational Harmonic Analysis ( IF 2.5 ) Pub Date : 2022-01-04 , DOI: 10.1016/j.acha.2021.12.010
M. Murray _{1,

2} , V. Abrol ₁ , J. Tanner _{1,

2}

Affiliation

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance $σ_{b}^{2}$ of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms of test and training accuracy and in terms of training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.

中文翻译：

深度网络的激活函数设计：线性和有效初始化

部署在深度神经网络中的激活函数对初始化时的网络性能有很大影响，进而对训练产生影响。在本文中，我们研究如何避免在先前工作中确定的初始化时的两个问题：成对输入相关性的快速收敛，以及梯度消失和爆炸。我们证明，通过选择一个在原点周围具有足够大的线性区域的激活函数可以避免这两个问题，相对于偏差方差 $σ_{乙}^{2}$ 网络的随机初始化。我们凭经验证明，使用此类激活函数在实践中会带来实实在在的好处，无论是在测试和训练准确性方面还是在训练时间方面。此外，我们观察到线性区域外非线性激活的形状似乎对训练的影响相对有限。最后，我们的结果还允许我们在新的超参数机制中训练网络，其偏差方差比以前可能的要大得多。

更新日期：2022-01-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>