Group Normalization,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Group Normalization
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2019-07-22 , DOI: 10.1007/s11263-019-01198-w
Yuxin Wu , Kaiming He

Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems—BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN-based counterparts for object detection and segmentation in COCO ( https://github.com/facebookresearch/Detectron/blob/master/projects/GN ), and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

中文翻译：

组归一化

批量归一化 (BN) 是深度学习发展中的一项里程碑技术，使各种网络能够进行训练。然而，沿批维度归一化会带来问题——当批大小变小时，BN 的误差会迅速增加，这是由不准确的批统计估计造成的。这限制了 BN 在训练较大模型和将特征转移到计算机视觉任务（包括检测、分割和视频）方面的使用，这些任务需要受内存消耗限制的小批量。在本文中，我们将组归一化 (GN) 作为 BN 的简单替代方案。GN 将通道分成组并在每组内计算均值和方差以进行归一化。GN 的计算与批量大小无关，其精度在很宽的批量大小范围内稳定。在 ImageNet 训练的 ResNet-50 上，GN 有 10 个。使用批量大小 2 时，错误比 BN 低 6%；当使用典型的批量大小时，GN 与 BN 相当，并且优于其他标准化变体。此外，GN 可以自然地从预训练转移到微调。在 COCO ( https://github.com/facebookresearch/Detectron/blob/master/projects/GN ) 和 Kinetics 中的视频分类方面，GN 可以优于基于 BN 的同类产品，这表明 GN 可以有效地替代强大的BN在各种任务中。GN 可以通过现代库中的几行代码轻松实现。在 COCO ( https://github.com/facebookresearch/Detectron/blob/master/projects/GN ) 和 Kinetics 中的视频分类方面，GN 可以优于基于 BN 的同类产品，这表明 GN 可以有效地替代强大的BN在各种任务中。GN 可以通过现代库中的几行代码轻松实现。在 COCO ( https://github.com/facebookresearch/Detectron/blob/master/projects/GN ) 和 Kinetics 中的视频分类方面，GN 可以优于基于 BN 的同类产品，这表明 GN 可以有效地替代强大的BN在各种任务中。GN 可以通过现代库中的几行代码轻松实现。

更新日期：2019-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11