当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-way backpropagation for training compact deep neural networks.
Neural Networks ( IF 6.0 ) Pub Date : 2020-03-26 , DOI: 10.1016/j.neunet.2020.03.001
Yong Guo 1 , Jian Chen 1 , Qing Du 1 , Anton Van Den Hengel 2 , Qinfeng Shi 2 , Mingkui Tan 3
Affiliation  

Depth is one of the key factors behind the success of convolutional neural networks (CNNs). Since ResNet (He et al., 2016), we are able to train very deep CNNs as the gradient vanishing issue has been largely addressed by the introduction of skip connections. However, we observe that, when the depth is very large, the intermediate layers (especially shallow layers) may fail to receive sufficient supervision from the loss due to severe transformation through long backpropagation path. As a result, the representation power of intermediate layers can be very weak and the model becomes very redundant with limited performance. In this paper, we first investigate the supervision vanishing issue in existing backpropagation (BP) methods. And then, we propose to address it via an effective method, called Multi-way BP (MW-BP), which relies on multiple auxiliary losses added to the intermediate layers of the network. The proposed MW-BP method can be applied to most deep architectures with slight modifications, such as ResNet and MobileNet. Our method often gives rise to much more compact models (denoted by "Mw+Architecture") than existing methods. For example, MwResNet-44 with 44 layers performs better than ResNet-110 with 110 layers on CIFAR-10 and CIFAR-100. More critically, the resultant models even outperform the light models obtained by state-of-the-art model compression methods. Last, our method inherently produces multiple compact models with different depths at the same time, which is helpful for model selection. Extensive experiments on both image classification and face recognition demonstrate the superiority of the proposed method.

中文翻译:

用于训练紧凑型深度神经网络的多向反向传播。

深度是卷积神经网络(CNN)成功背后的关键因素之一。自ResNet(He et al。,2016)起,由于能够通过引入跳过连接很大程度上解决了梯度消失问题,因此我们能够训练非常深的CNN。但是,我们观察到,当深度很大时,中间层(尤其是浅层)可能会由于经过长距离反向传播路径的严重转变而无法从损失中获得足够的监督。结果,中间层的表示能力可能非常弱,并且模型在性能有限的情况下变得非常冗余。在本文中,我们首先研究现有反向传播(BP)方法中消失的监督问题。然后,我们建议通过一种称为多路BP(MW-BP)的有效方法来解决该问题,它依赖于添加到网络中间层的多个辅助损耗。所提出的MW-BP方法可以稍加修改即可应用于大多数深度架构,例如ResNet和MobileNet。与现有方法相比,我们的方法通常会产生更紧凑的模型(用“ Mw +建筑”表示)。例如,在CIFAR-10和CIFAR-100上,具有44层的MwResNet-44的性能优于具有110层的ResNet-110。更关键的是,所得模型甚至优于通过最新模型压缩方法获得的光照模型。最后,我们的方法固有地同时生成多个具有不同深度的紧凑模型,这对于模型选择很有帮助。大量的图像分类和面部识别实验证明了该方法的优越性。
更新日期:2020-03-27
down
wechat
bug