当前位置: X-MOL 学术Proc. Royal Soc. A: Math. Phys. Eng. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences ( IF 3.5 ) Pub Date : 2020-07-01 , DOI: 10.1098/rspa.2020.0334
Ameya D Jagtap 1 , Kenji Kawaguchi 2 , George Em Karniadakis 1, 3
Affiliation  

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix–vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.

中文翻译:

具有斜率恢复的局部自适应激活函数,用于深度和物理信息神经网络

我们提出了两种局部自适应激活函数的方法,即逐层和神经元局部自适应激活函数,它们提高了深度和物理信息神经网络的性能。激活函数的局部自适应是通过在每一层(逐层)和每个神经元(神经元)中分别引入一个可伸缩参数,然后使用随机梯度下降算法的变体对其进行优化来实现的。为了进一步提高训练速度,在损失函数中加入了一个基于激活斜率的斜率恢复项,进一步加速收敛,从而降低了训练成本。在理论方面,我们证明在所提出的方法中,在初始化和学习率的实际条件下,梯度下降算法不会被次优临界点或局部最小值所吸引,并且所提出方法的梯度动力学无法通过具有任何(自适应)学习率的基本方法实现。我们进一步表明,自适应激活方法通过将条件矩阵隐式乘以基本方法的梯度来加速收敛,而无需对条件矩阵和矩阵向量乘积进行任何显式计算。不同的自适应激活函数显示出诱导不同的隐式条件矩阵。此外,所提出的具有斜率恢复的方法可以加速训练过程。并且所提出方法的梯度动态是无法通过具有任何(自适应)学习率的基本方法实现的。我们进一步表明,自适应激活方法通过将条件矩阵隐式乘以基本方法的梯度来加速收敛,而无需对条件矩阵和矩阵向量乘积进行任何显式计算。不同的自适应激活函数显示出诱导不同的隐式条件矩阵。此外,所提出的具有斜率恢复的方法可以加速训练过程。并且所提出方法的梯度动态是无法通过具有任何(自适应)学习率的基本方法实现的。我们进一步表明,自适应激活方法通过将条件矩阵隐式乘以基本方法的梯度来加速收敛,而无需对条件矩阵和矩阵向量乘积进行任何显式计算。不同的自适应激活函数显示出诱导不同的隐式条件矩阵。此外,所提出的具有斜率恢复的方法被证明可以加速训练过程。我们进一步表明,自适应激活方法通过将条件矩阵隐式乘以基本方法的梯度来加速收敛,而无需对条件矩阵和矩阵向量乘积进行任何显式计算。不同的自适应激活函数显示出诱导不同的隐式条件矩阵。此外,所提出的具有斜率恢复的方法被证明可以加速训练过程。我们进一步表明,自适应激活方法通过将条件矩阵隐式乘以基本方法的梯度来加速收敛,而无需对条件矩阵和矩阵向量乘积进行任何显式计算。不同的自适应激活函数显示出诱导不同的隐式条件矩阵。此外,所提出的具有斜率恢复的方法可以加速训练过程。
更新日期:2020-07-01
down
wechat
bug