Universal statistics of Fisher information in deep neural networks: mean field approach *,Journal of Statistical Mechanics: Theory and Experiment

当前位置： X-MOL 学术 › J. Stat. Mech. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Universal statistics of Fisher information in deep neural networks: mean field approach *
Journal of Statistical Mechanics: Theory and Experiment ( IF 2.2 ) Pub Date : 2020-12-22 , DOI: 10.1088/1742-5468/abc62e
Ryo Karakida ₁ , Shotaro Akaho ₁ , Shun-ichi Amari ₂

Affiliation

The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs). The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. To this end, we use random weights and large width limits, which enables us to utilize mean field theories. We investigate the asymptotic statistics of the FIM's eigenvalues and reveal that most of them are close to zero while the maximum eigenvalue takes a huge value. Because the landscape of the parameter space is defined by the FIM, it is locally flat in most dimensions, but strongly distorted in others. Moreover, we demonstrate the potential usage of the derived statistics in learning strategies. First, small eigenvalues that induce flatness can be connected to a norm-based capacity measure of generalization ability. Second, the maximum eigenvalue that induces the distortion enables us to quantitatively estimate an appropriately sized learning rate for gradient methods to converge.

中文翻译：

深度神经网络中Fisher信息的通用统计：均值场方法*

Fisher 信息矩阵 (FIM) 是表示随机模型特征的基本量，包括深度神经网络 (DNN)。本研究揭示了 FIM 的新统计数据，这些数据在广泛的 DNN 类别中是通用的。为此，我们使用随机权重和大宽度限制，这使我们能够利用平均场理论。我们研究了 FIM 特征值的渐近统计，发现它们中的大多数接近于零，而最大特征值取一个巨大的值。因为参数空间的景观是由 FIM 定义的，所以它在大多数维度上是局部平坦的，但在其他维度上则强烈扭曲。此外，我们展示了派生统计在学习策略中的潜在用途。第一的，引起平坦度的小特征值可以与泛化能力的基于范数的能力度量相关联。其次，引起失真的最大特征值使我们能够定量估计梯度方法收敛的适当大小的学习率。

更新日期：2020-12-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文