Derivation and analysis of parallel-in-time neural ordinary differential equations,Annals of Mathematics and Artificial Intelligence

当前位置： X-MOL 学术 › Ann. Math. Artif. Intel. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Derivation and analysis of parallel-in-time neural ordinary differential equations
Annals of Mathematics and Artificial Intelligence ( IF 1.2 ) Pub Date : 2020-07-25 , DOI: 10.1007/s10472-020-09702-6
E. Lorin

The introduction in 2015 of Residual Neural Networks (RNN) and ResNET allowed for outstanding improvements of the performance of learning algorithms for evolution problems containing a “large” number of layers. Continuous-depth RNN-like models called Neural Ordinary Differential Equations (NODE) were then introduced in 2019. The latter have a constant memory cost, and avoid the a priori specification of the number of hidden layers. In this paper, we derive and analyze a parallel (-in-parameter and time) version of the NODE, which potentially allows for a more efficient implementation than a standard/naive parallelization of NODEs with respect to the parameters only. We expect this approach to be relevant whenever we have access to a very large number of processors, or when we are dealing with high dimensional ODE systems. Moreover, when using implicit ODE solvers, solutions to linear systems with up to cubic complexity are then required for solving nonlinear systems using for instance Newton’s algorithm; as the proposed approach allows to reduce the overall number of time-steps thanks to an iterative increase of the accuracy order of the ODE system solvers, it then reduces the number of linear systems to solve, hence benefiting from a scaling effect.

中文翻译：

时间并行神经常微分方程的推导与分析

2015 年残差神经网络 (RNN) 和 ResNET 的引入使得针对包含“大量”层数的进化问题的学习算法的性能得到显着改进。然后在 2019 年引入了称为神经常微分方程 (NODE) 的连续深度 RNN 类模型。后者具有恒定的内存成本，并避免了隐藏层数量的先验规范。在本文中，我们推导出并分析了节点的并行（参数内和时间）版本，它可能允许比仅在参数方面的节点的标准/朴素并行化更有效的实现。我们希望这种方法在我们可以访问大量处理器或处理高维 ODE 系统时适用。而且，当使用隐式 ODE 求解器时，需要使用高达三次复杂度的线性系统的解来使用例如牛顿算法求解非线性系统；由于 ODE 系统求解器的精度阶数的迭代增加，所提出的方法允许减少时间步长的总数，因此它减少了要求解的线性系统的数量，因此受益于缩放效应。

更新日期：2020-07-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11