当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tensor Programs III: Neural Matrix Laws
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-22 , DOI: arxiv-2009.10685
Greg Yang

In a neural network (NN), \emph{weight matrices} linearly transform inputs into \emph{preactivations} that are then transformed nonlinearly into \emph{activations}. A typical NN interleaves multitudes of such linear and nonlinear transforms to express complex functions. Thus, the (pre-)activations depend on the weights in an intricate manner. We show that, surprisingly, (pre-)activations of a randomly initialized NN become \emph{independent} from the weights as the NN's widths tend to infinity, in the sense of \emph{asymptotic freeness} in random matrix theory. We call this the \emph{Free Independence Principle (FIP)}, which has these consequences: 1) It rigorously justifies the calculation of asymptotic Jacobian singular value distribution of an NN in Pennington et al. [36,37], essential for training ultra-deep NNs [48]. 2) It gives a new justification of \emph{gradient independence assumption} used for calculating the \emph{Neural Tangent Kernel} of a neural network. FIP and these results hold for any neural architecture. We show FIP by proving a Master Theorem for any Tensor Program, as introduced in Yang [50,51], generalizing the Master Theorems proved in those works. As warmup demonstrations of this new Master Theorem, we give new proofs of the semicircle and Marchenko-Pastur laws, which benchmarks our framework against these fundamental mathematical results.

中文翻译:

张量程序 III:神经矩阵定律

在神经网络 (NN) 中,\emph{权重矩阵} 将输入线性转换为 \emph{preactivations},然后将其非线性转换为 \emph{activations}。一个典型的 NN 交织大量这样的线性和非线性变换来表达复杂的函数。因此,(预)激活以复杂的方式依赖于权重。我们表明,令人惊讶的是,随机初始化的 NN 的(预)激活变得与权重 \emph{independent},因为 NN 的宽度趋于无穷大,在随机矩阵理论的 \emph{asymptotic freeness} 意义上。我们称其为\emph{自由独立原则(FIP)},它具有以下结果:1)它严格证明了 Pennington 等人对 NN 的渐近雅可比奇异值分布的计算。[36,37],对于训练超深神经网络至关重要 [48]。2)它给出了用于计算神经网络的\emph{Neural Tangent Kernel}的\emph{梯度独立假设}的新证明。FIP 和这些结果适用于任何神经架构。我们通过证明任何张量程序的主定理来展示 FIP,如 Yang [50,51] 中介绍的那样,概括了在这些作品中证明的主定理。作为这个新主定理的热身演示,我们给出了半圆和马尔琴科-帕斯图定律的新证明,它们根据这些基本数学结果对我们的框架进行了基准测试。概括这些作品中证明的主定理。作为这个新主定理的热身演示,我们给出了半圆和马尔琴科-帕斯图定律的新证明,它们根据这些基本数学结果对我们的框架进行了基准测试。概括这些作品中证明的主定理。作为这个新主定理的热身演示,我们给出了半圆和马尔琴科-帕斯图定律的新证明,它们根据这些基本数学结果对我们的框架进行了基准测试。
更新日期:2020-09-23
down
wechat
bug