The dropout learning algorithm,Artificial Intelligence

当前位置： X-MOL 学术 › Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The dropout learning algorithm
Artificial Intelligence ( IF 5.1 ) Pub Date : 2014-05-01 , DOI: 10.1016/j.artint.2014.02.004
Pierre Baldi ₁ , Peter Sadowski ₁

Affiliation

Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear case. The ensemble averaging properties of dropout in non-linear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for self-consistent variance minimization and sparse representations.

中文翻译：

辍学学习算法

Dropout 是最近引入的一种训练神经网络的算法，它在训练期间随机丢弃单元以防止它们的协同适应。使用伯努利门控变量对 dropout 的一些静态和动态属性进行了数学分析，该变量一般足以适应单元或连接上的 dropout，并且具有可变速率。该框架允许对线性网络中 dropout 的整体平均特性进行完整分析，这对于理解非线性情况很有用。非线性逻辑网络中 dropout 的整体平均特性源于三个基本方程：（1）逻辑函数的期望通过归一化几何平均值的近似值，由此推导出边界和估计值；(2) 逻辑函数的归一化几何均值与均值逻辑之间的代数等式，在数学上表征逻辑函数；(3) 均值相对于总和以及自变量乘积的线性度。结果还扩展到其他类别的传递函数，包括修正线性函数。近似误差往往会相互抵消并且不会累积。Dropout 还可以连接到随机神经元并用于预测放电率，并通过将反向传播视为 dropout 线性网络中的整体平均来进行反向传播。此外，可以从随机梯度下降的角度理解 dropout 的收敛特性。最后，对于 dropout 的正则化特性，

更新日期：2014-05-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11