Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-01-19 , DOI: arxiv-2001.06838
Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei Zhang, Yichen Wei, Jian Sun

Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. The code has been released in https://github.com/megvii-model/MABN.

中文翻译：

在批量归一化的反向传播中稳定批量统计

批量归一化（BN）是深度学习领域中使用最广泛的技术之一。但是如果批量大小不足，它的性能会严重下降。这个弱点限制了 BN 在许多计算机视觉任务中的使用，比如检测或分割，由于内存消耗的限制，批处理大小通常很小。因此提出了许多改进的归一化技术，要么不能完全恢复BN的性能，要么必须在推理过程中引入额外的非线性操作并增加巨大的消耗。在本文中，我们揭示了 BN 的反向传播涉及两个额外的批次统计数据，之前从未对此进行过很好的讨论。与梯度相关的额外批次统计数据也会严重影响深度神经网络的训练。基于我们的分析，我们提出了一种新的归一化方法，称为移动平均批量归一化（MABN）。MABN 可以在小批量情况下完全恢复普通 BN 的性能，而无需在推理过程中引入任何额外的非线性操作。我们通过理论分析和实验证明了 MABN 的好处。我们的实验证明了 MABN 在包括 ImageNet 和 COCO 在内的多个计算机视觉任务中的有效性。代码已在 https://github.com/megvii-model/MABN 发布。我们通过理论分析和实验证明了 MABN 的好处。我们的实验证明了 MABN 在包括 ImageNet 和 COCO 在内的多个计算机视觉任务中的有效性。代码已在 https://github.com/megvii-model/MABN 发布。我们通过理论分析和实验证明了 MABN 的好处。我们的实验证明了 MABN 在包括 ImageNet 和 COCO 在内的多个计算机视觉任务中的有效性。代码已在 https://github.com/megvii-model/MABN 发布。

更新日期：2020-06-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文