Chinese Physics Letters ( IF 3.5 ) Pub Date : 2021-03-25 , DOI: 10.1088/0256-307x/38/3/038701 Yaoyu Zhang 1, 2 , Tao Luo 1 , Zheng Ma 1 , Zhi-Qin John Xu 1
Why heavily parameterized neural networks (NNs) do not overfit the data is an important long standing open question. We propose a phenomenological model of the NN training to explain this non-overfitting puzzle. Our linear frequency principle (LFP) model accounts for a key dynamical feature of NNs: they learn low frequencies first, irrespective of microscopic details. Theory based on our LFP model shows that low frequency dominance of target functions is the key condition for the non-overfitting of NNs and is verified by experiments. Furthermore, through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to an LFP model with quantitative prediction power.
中文翻译:
用于理解神经网络中不存在过拟合的线性频率原理模型
为什么高度参数化的神经网络 (NN) 不会过度拟合数据是一个重要的长期悬而未决的问题。我们提出了一个神经网络训练的现象学模型来解释这个非过拟合难题。我们的线性频率原理 (LFP) 模型解释了 NN 的一个关键动态特征:它们首先学习低频,而不考虑微观细节。基于我们的 LFP 模型的理论表明,目标函数的低频优势是 NN 不过度拟合的关键条件,并已通过实验验证。此外,通过理想的两层 NN,我们揭示了详细的微观 NN 训练动态如何在统计上产生具有定量预测能力的 LFP 模型。