A Gradient Flow Framework For Analyzing Network Pruning,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Gradient Flow Framework For Analyzing Network Pruning
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-24 , DOI: arxiv-2009.11839
Ekdeep Singh Lubana and Robert P. Dick

Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed to prune trained models. Despite lacking justification for their use early-on in training, such measures result in surprisingly low accuracy loss. To better explain this behavior, we develop a general gradient flow based framework that unifies state-of-the-art importance measures through the norm of model parameters. We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several results related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is therefore appropriate for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, such that increasing gradient norm via pruning can produce poorly performing models. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10 and CIFAR-100. Code available at https://github.com/EkdeepSLubana/flowandprune.

中文翻译：

用于分析网络修剪的梯度流框架

最近的网络修剪方法侧重于在训练的早期修剪模型。为了估计删除参数的影响，这些方法使用最初设计用于修剪训练模型的重要性度量。尽管缺乏在训练早期使用它们的理由，但这些措施会导致令人惊讶的低精度损失。为了更好地解释这种行为，我们开发了一个基于梯度流的通用框架，该框架通过模型参数的范数统一了最先进的重要性度量。我们使用这个框架来确定修剪措施与模型参数演变之间的关系，在训练早期建立与修剪模型相关的几个结果：（i）基于幅度的修剪删除对减少损失贡献最小的参数，导致模型比幅度不可知方法收敛得更快；(ii) 基于损失保留的剪枝保留一阶模型进化动力学，因此适用于剪枝最少训练的模型；(iii) 基于梯度范数的剪枝影响二阶模型演化动力学，因此通过剪枝增加梯度范数会产生性能不佳的模型。我们在多个 VGG-13、MobileNet-V1 和 ResNet-56 模型上验证了我们在 CIFAR-10 和 CIFAR-100 上训练的模型。代码可在 https://github.com/EkdeepSLubana/flowandprune 获得。这样通过剪枝增加梯度范数会产生性能不佳的模型。我们在多个 VGG-13、MobileNet-V1 和 ResNet-56 模型上验证了我们在 CIFAR-10 和 CIFAR-100 上训练的模型。代码可在 https://github.com/EkdeepSLubana/flowandprune 获得。这样通过剪枝增加梯度范数会产生性能不佳的模型。我们在多个 VGG-13、MobileNet-V1 和 ResNet-56 模型上验证了我们在 CIFAR-10 和 CIFAR-100 上训练的模型。代码可在 https://github.com/EkdeepSLubana/flowandprune 获得。

更新日期：2020-10-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文