Accelerating Training in Artificial Neural Networks with Dynamic Mode Decomposition,arXiv - CS - Computational Engineering, Finance, and Science

当前位置： X-MOL 学术 › arXiv.cs.CE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accelerating Training in Artificial Neural Networks with Dynamic Mode Decomposition
arXiv - CS - Computational Engineering, Finance, and Science Pub Date : 2020-06-18 , DOI: arxiv-2006.14371
Mauricio E. Tano, Gavin D. Portwood, Jean C. Ragusa

Training of deep neural networks (DNNs) frequently involves optimizing several millions or even billions of parameters. Even with modern computing architectures, the computational expense of DNN training can inhibit, for instance, network architecture design optimization, hyper-parameter studies, and integration into scientific research cycles. The key factor limiting performance is that both the feed-forward evaluation and the back-propagation rule are needed for each weight during optimization in the update rule. In this work, we propose a method to decouple the evaluation of the update rule at each weight. At first, Proper Orthogonal Decomposition (POD) is used to identify a current estimate of the principal directions of evolution of weights per layer during training based on the evolution observed with a few backpropagation steps. Then, Dynamic Mode Decomposition (DMD) is used to learn the dynamics of the evolution of the weights in each layer according to these principal directions. The DMD model is used to evaluate an approximate converged state when training the ANN. Afterward, some number of backpropagation steps are performed, starting from the DMD estimates, leading to an update to the principal directions and DMD model. This iterative process is repeated until convergence. By fine-tuning the number of backpropagation steps used for each DMD model estimation, a significant reduction in the number of operations required to train the neural networks can be achieved. In this paper, the DMD acceleration method will be explained in detail, along with the theoretical justification for the acceleration provided by DMD. This method is illustrated using a regression problem of key interest for the scientific machine learning community: the prediction of a pollutant concentration field in a diffusion, advection, reaction problem.

中文翻译：

使用动态模式分解加速人工神经网络的训练

深度神经网络 (DNN) 的训练经常涉及优化数百万甚至数十亿个参数。即使使用现代计算架构，DNN 训练的计算费用也会抑制网络架构设计优化、超参数研究以及与科学研究周期的整合。限制性能的关键因素是更新规则中优化过程中每个权重都需要前馈评估和反向传播规则。在这项工作中，我们提出了一种方法来解耦每个权重的更新规则的评估。首先，基于通过几个反向传播步骤观察到的演化，使用适当的正交分解 (POD) 来确定训练期间每层权重演化的主要方向的当前估计。然后，动态模式分解 (DMD) 用于根据这些主要方向学习每层权重演变的动态。DMD 模型用于在训练 ANN 时评估近似收敛状态。之后，执行一定数量的反向传播步骤，从 DMD 估计开始，导致对主要方向和 DMD 模型的更新。重复这个迭代过程直到收敛。通过微调用于每个 DMD 模型估计的反向传播步骤的数量，可以显着减少训练神经网络所需的操作数量。在本文中，将详细解释 DMD 加速方法，以及 DMD 提供的加速的理论依据。

更新日期：2020-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文