当前位置: X-MOL 学术Neural Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification.
Neural Computation ( IF 2.9 ) Pub Date : 2021-07-26 , DOI: 10.1162/neco_a_01407
Andreas Knoblauch 1
Affiliation  

Supervised learning corresponds to minimizing a loss or cost function expressing the differences between model predictions yn and the target values tn given by the training data. In neural networks, this means backpropagating error signals through the transposed weight matrixes from the output layer toward the input layer. For this, error signals in the output layer are typically initialized by the difference yn- tn, which is optimal for several commonly used loss functions like cross-entropy or sum of squared errors. Here I evaluate a more general error initialization method using power functions |yn- tn|q for q>0, corresponding to a new family of loss functions that generalize cross-entropy. Surprisingly, experiments on various learning tasks reveal that a proper choice of q can significantly improve the speed and convergence of backpropagation learning, in particular in deep and recurrent neural networks. The results suggest two main reasons for the observed improvements. First, compared to cross-entropy, the new loss functions provide better fits to the distribution of error signals in the output layer and therefore maximize the model's likelihood more efficiently. Second, the new error initialization procedure may often provide a better gradient-to-loss ratio over a broad range of neural output activity, thereby avoiding flat loss landscapes with vanishing gradients.

中文翻译:

幂函数误差初始化可以提高神经网络中反向传播学习的收敛性以进行分类。

监督学习对应于最小化表示模型预测 yn 与训练数据给出的目标值 tn 之间差异的损失或成本函数。在神经网络中,这意味着通过转置权重矩阵从输出层向输入层反向传播误差信号。为此,输出层中的误差信号通常由差值 yn-tn 初始化,这对于一些常用的损失函数(如交叉熵或平方误差总和)来说是最佳的。在这里,我使用幂函数 |yn-tn|q 评估更通用的误差初始化方法,q>0,对应于泛化交叉熵的新损失函数系列。出奇,各种学习任务的实验表明,正确选择的Q可以显着提高反向衰减学习的速度和融合,特别是在深度和经常性的神经网络中。结果表明观察到的改进有两个主要原因。首先,与交叉熵相比,新的损失函数可以更好地拟合输出层中的误差信号分布,从而更有效地最大化模型的似然性。其次,新的错误初始化程序通常可以在广泛的神经输出活动中提供更好的梯度损失比,从而避免梯度消失的平坦损失景观。新的损失函数更好地拟合输出层中的误差信号分布,从而更有效地最大化模型的似然性。其次,新的错误初始化程序通常可以在广泛的神经输出活动中提供更好的梯度损失比,从而避免梯度消失的平坦损失景观。新的损失函数更好地拟合输出层中的误差信号分布,从而更有效地最大化模型的似然性。其次,新的错误初始化程序通常可以在广泛的神经输出活动中提供更好的梯度损失比,从而避免梯度消失的平坦损失景观。
更新日期:2021-07-26
down
wechat
bug