Block layer decomposition schemes for training deep neural networks,Journal of Global Optimization

当前位置： X-MOL 学术 › J. Glob. Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Block layer decomposition schemes for training deep neural networks
Journal of Global Optimization ( IF 1.3 ) Pub Date : 2019-11-15 , DOI: 10.1007/s10898-019-00856-0
Laura Palagi , Ruggiero Seccia

Deep feedforward neural networks’ (DFNNs) weight estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions of the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how block coordinate descent (BCD) methods can be fruitful applied to DFNN weight optimization problem and embedded in online frameworks possibly avoiding bad stationary points. We first describe a batch BCD method able to effectively tackle difficulties due to the network’s depth; then we further extend the algorithm proposing an online BCD scheme able to scale with respect to both the number of variables and the number of samples. We perform extensive numerical results on standard datasets using various deep networks. We show that the application of BCD methods to the training problem of DFNNs improves over standard batch/online algorithms in the training phase guaranteeing good generalization performance as well.

中文翻译：

训练深度神经网络的块层分解方案

深度前馈神经网络（DFNN）权重估计依赖于一个非常大的非凸优化问题的解决方案，该问题可能具有许多局部（无全局）最小化器，鞍点和较大的平稳期。此外，找到训练问题的良好解决方案所需的时间在很大程度上取决于样本数和权重（变量）数。在这项工作中，我们展示了如何将块坐标下降（BCD）方法有效地应用于DFNN权重优化问题并将其嵌入在线框架中，从而可以避免不良的固定点。我们首先描述一种批处理BCD方法，该方法能够有效解决由于网络深度而引起的困难。然后我们进一步扩展算法，提出在线BCD方案能够根据变量数量和样本数量进行缩放。我们使用各种深度网络对标准数据集执行广泛的数值结果。我们表明，在训练阶段，将BCD方法应用于DFNN的训练问题比标准批处理/在线算法有所改进，从而保证了良好的泛化性能。

更新日期：2020-04-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11