当前位置: X-MOL 学术Neural Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Effect of the Activation Function on the Distribution of Hidden Nodes in a Deep Network
Neural Computation ( IF 2.7 ) Pub Date : 2019-12-01 , DOI: 10.1162/neco_a_01235
Philip M Long 1 , Hanie Sedghi 1
Affiliation  

We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to gaussian distributions. We show that if the activation function φ satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the “length process” converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases and the activation function φ. We also show that this convergence may fail for φ that violate our assumptions. We show how to use this analysis to choose the variance of weight initialization, depending on the activation function, so that hidden variables maintain a consistent scale throughout the network.

中文翻译:

关于激活函数对深层网络中隐藏节点分布的影响

当根据高斯分布随机选择权重和偏差时,我们分析了全连接深度网络不同层中隐藏变量向量长度的联合概率分布。我们表明,如果激活函数 φ 满足一组最小的假设,我们知道在实践中使用的所有激活函数都满足,那么,随着网络的宽度变大,“长度过程”在概率上收敛到长度图被确定为随机权重和偏差的方差以及激活函数 φ 的简单函数。我们还表明,对于违反我们假设的 φ,这种收敛可能会失败。我们展示了如何使用这种分析来选择权重初始化的方差,这取决于激活函数,
更新日期:2019-12-01
down
wechat
bug