Smaller generalization error derived for a deep residual neural network compared with shallow networks,IMA Journal of Numerical Analysis

当前位置： X-MOL 学术 › IMA J. Numer. Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Smaller generalization error derived for a deep residual neural network compared with shallow networks
IMA Journal of Numerical Analysis ( IF 2.3 ) Pub Date : 2022-09-13 , DOI: 10.1093/imanum/drac049
Aku Kammonen ₁ , Jonas Kiessling ₂ , Petr Plecháč ₃ , Mattias Sandberg ₄ , Anders Szepessy ₄ , Raul Tempone ₅

Affiliation

Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $\bar z_{\ell +1}=\bar z_\ell + \textrm {Re}\sum _{k=1}^K\bar b_{\ell k}\,e^{\textrm {i}\omega _{\ell k}\bar z_\ell }+ \textrm {Re}\sum _{k=1}^K\bar c_{\ell k}\,e^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x}$. An optimal distribution for the frequencies $(\omega _{\ell k},\omega ^{\prime}_{\ell k})$ of the random Fourier features $e^{\textrm {i}\omega _{\ell k}\bar z_\ell }$ and $e^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x}$ is derived. This derivation is based on the corresponding generalization error for the approximation of the function values $f(x)$. The generalization error turns out to be smaller than the estimate ${\|\hat f\|^2_{L^1({\mathbb {R}}^d)}}/{(KL)}$ of the generalization error for random Fourier features, with one hidden layer and the same total number of nodes $KL$, in the case of the $L^\infty $-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $\hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.

中文翻译：

与浅层网络相比，深度残差神经网络的泛化误差更小

对于具有 $L$ 随机傅立叶特征层 $\bar z_{\ell +1}=\bar z_\ell + \textrm {Re}\sum _{k=1} 的残差神经网络，证明了泛化误差的估计^K\bar b_{\ell k}\,e^{\textrm {i}\omega _{\ell k}\bar z_\ell }+ \textrm {Re}\sum _{k=1}^K \bar c_{\ell k}\,e^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x}$。随机傅立叶特征 $e^{\textrm {i}\omega _{ 的频率 $(\omega _{\ell k},\omega ^{\prime}_{\ell k})$ 的最优分布\ell k}\bar z_\ell }$ 和 $e^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x}$ 是导出的。此推导基于函数值 $f(x)$ 的近似值的相应泛化误差。结果证明泛化误差小于泛化误差的估计 ${\|\hat f\|^2_{L^1({\mathbb {R}}^d)}}/{(KL)}$对于随机傅里叶特征，在一个隐藏层和相同的节点总数 $KL$ 的情况下，$f$ 的 $L^\infty $-范数远小于其傅里叶变换 $\ 的 $L^1$-范数帽子 f$。这种对随机特征最优分布的理解被用来构建深度残差网络的新训练方法。计算实验证明了所提出的新算法的良好性能。

更新日期：2022-09-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11