Global Convergence of Three-layer Neural Networks in the Mean Field Regime,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Global Convergence of Three-layer Neural Networks in the Mean Field Regime
arXiv - CS - Machine Learning Pub Date : 2021-05-11 , DOI: arxiv-2105.05228
Huy Tuan Pham, Phan-Minh Nguyen

In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit. This lends a way to study large-width neural networks via analyzing the mean field limit. Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees. The extension to multilayer ones however has been a highly challenging puzzle, and little is known about the optimization efficiency in the mean field regime when there are more than two layers. In this work, we prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime. We first develop a rigorous framework to establish the mean field limit of three-layer networks under stochastic gradient descent training. To that end, we propose the idea of a \textit{neuronal embedding}, which comprises of a fixed probability space that encapsulates neural networks of arbitrary sizes. The identified mean field limit is then used to prove a global convergence guarantee under suitable regularity and convergence mode assumptions, which -- unlike previous works on two-layer networks -- does not rely critically on convexity. Underlying the result is a universal approximation property, natural of neural networks, which importantly is shown to hold at \textit{any} finite training time (not necessarily at convergence) via an algebraic topology argument.

中文翻译：

均值域三层神经网络的全局收敛性

在平均场状态下，神经网络被适当地缩放，以使得随着宽度趋于无穷大，学习动力学趋向于非线性且非平凡的动力学极限，称为平均场极限。这提供了一种通过分析平均场限制来研究大宽度神经网络的方法。最近的工作已成功地将这种分析应用于两层网络，并提供了全球融合的保证。然而，扩展到多层结构一直是一个极具挑战性的难题，并且当多层结构超过两层时，对于平均场状态下的优化效率知之甚少。在这项工作中，我们证明了在平均场状态下非规则前馈三层网络的全局收敛结果。我们首先建立一个严格的框架，以建立随机梯度下降训练下的三层网络的平均场极限。为此，我们提出了\ textit {neuronal embedding}的想法，它由封装了任意大小的神经网络的固定概率空间组成。然后，在适当的规则性和收敛模式假设下，使用识别出的平均场限制来证明全局收敛性保证，与先前在两层网络上所做的工作不同，该保证不严格依赖于凸性。结果的基础是神经网络的一种自然的通用近似特性，重要的是通过代数拓扑参数可以证明它在\ textit {any}有限的训练时间（不一定处于收敛）。我们提出了\ textit {neuronal embedding}的想法，它由封装了任意大小的神经网络的固定概率空间组成。然后，在适当的规则性和收敛模式假设下，使用识别出的平均场限制来证明全局收敛性保证，与先前在两层网络上所做的工作不同，该保证不严格依赖于凸性。结果的基础是神经网络的一种自然的通用近似特性，重要的是通过代数拓扑参数可以证明它在\ textit {any}有限的训练时间（不一定处于收敛）。我们提出了\ textit {neuronal embedding}的想法，它由封装了任意大小的神经网络的固定概率空间组成。然后，在适当的规则性和收敛模式假设下，使用识别出的平均场限制来证明全局收敛性保证，与先前在两层网络上所做的工作不同，该保证不严格依赖于凸性。结果的基础是神经网络的一种自然的通用近似特性，重要的是通过代数拓扑参数可以证明它在\ textit {any}有限的训练时间（不一定处于收敛）。然后，在适当的规则性和收敛模式假设下，使用识别出的平均场限制来证明全局收敛性保证，与先前在两层网络上所做的工作不同，该保证不严格依赖于凸性。结果的基础是神经网络的一种自然的通用近似特性，重要的是通过代数拓扑参数可以证明它在\ textit {any}有限的训练时间（不一定处于收敛）。然后，在适当的规则性和收敛模式假设下，使用识别出的平均场限制来证明全局收敛性保证，与先前在两层网络上所做的工作不同，该保证不严格依赖于凸性。结果的基础是神经网络的一种自然的通用近似特性，重要的是通过代数拓扑参数可以证明它在\ textit {any}有限的训练时间（不一定处于收敛）。

更新日期：2021-05-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文