当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Why ResNet Works? Residuals Generalize.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2020-11-30 , DOI: 10.1109/tnnls.2020.2966319
Fengxiang He , Tongliang Liu , Dacheng Tao

Residual connections significantly boost the performance of deep neural networks. However, few theoretical results address the influence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This article studies the influence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We first present an upper bound of the covering number of networks with residual connections. This bound shares a similar structure with that of neural networks without residual connections. This result suggests that moving a weight matrix or nonlinear activation from the bone to a vine would not increase the hypothesis space. Afterward, an O(1 / √N) margin-based multiclass generalization bound is obtained for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straightforward. According to the obtained generalization bound, we should introduce regularization terms to control the magnitude of the norms of weight matrices not to increase too much, in practice, to ensure a good generalization ability, which justifies the technique of weight decay.

中文翻译:

为什么 ResNet 有效?残差概括。

残差连接显着提高了深度神经网络的性能。然而,很少有理论结果解决残差对假设复杂性和深度神经网络泛化能力的影响。本文从其假设空间的覆盖数来研究残差连接对神经网络假设复杂度的影响。我们首先给出具有残差连接的网络覆盖数量的上限。这个界限与没有残差连接的神经网络具有相似的结构。该结果表明,将权重矩阵或非线性激活从骨骼移动到藤蔓不会增加假设空间。之后,为 ResNet 获得了一个 O(1 / √N) 基于边际的多类泛化界限,作为任何具有残差连接的深度神经网络的示例。对类似的最先进神经网络架构(例如 DenseNet 和 ResNeXt)的泛化保证很简单。根据得到的泛化界,我们应该引入正则化项来控制权重矩阵的范数的大小不要增加太多,在实践中,以确保良好的泛化能力,这证明权重衰减技术是正确的。
更新日期:2020-02-05
down
wechat
bug