当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Knowledge from the original network: restore a better pruned network with knowledge distillation
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2021-01-10 , DOI: 10.1007/s40747-020-00248-y
Liyang Chen , Yongquan Chen , Juntong Xi , Xinyi Le

To deploy deep neural networks to edge devices with limited computation and storage costs, model compression is necessary for the application of deep learning. Pruning, as a traditional way of model compression, seeks to reduce the parameters of model weights. However, when a deep neural network is pruned, the accuracy of the network will significantly decrease. The traditional way to decrease the accuracy loss is fine-tuning. When over many parameters are pruned, the pruned network’s capacity is reduced heavily and cannot recover to high accuracy. In this paper, we apply the knowledge distillation strategy to abate the accuracy loss of pruned models. The original network of the pruned network was used as the teacher network, aiming to transfer the dark knowledge from the original network to the pruned sub-network. We have applied three mainstream knowledge distillation methods: response-based knowledge, feature-based knowledge, and relation-based knowledge (Gou et al. in Knowledge distillation: a survey. arXiv:200605525, 2020), and compare the result to the traditional fine-tuning method with grand-truth labels. Experiments have been done on the CIFAR100 dataset with several deep convolution neural network. Results show that the pruned network recovered by knowledge distillation with its original network performs better accuracy than it recovered by fine-tuning with sample labels. It has also been validated in this paper that the original network as the teacher performs better than differently structured networks with same accuracy as the teacher.



中文翻译:

来自原始网络的知识:通过知识蒸馏来恢复更好的修剪网络

为了将深度神经网络部署到计算和存储成本有限的边缘设备,模型压缩对于深度学习的应用是必需的。修剪是一种传统的模型压缩方法,旨在减少模型权重的参数。但是,当修剪深度神经网络时,网络的准确性将大大降低。减少精度损失的传统方法是微调。如果修剪了多个参数,则修剪的网络的容量将大大减少,并且无法恢复到高精度。在本文中,我们应用知识蒸馏策略来减少修剪模型的准确性损失。修剪网络的原始网络被用作教师网络,目的是将黑暗知识从原始网络转移到修剪子网络。我们已经应用了三种主流的知识提炼方法:基于响应的知识,基于特征的知识和基于关系的知识(Gou等人在“知识提炼:调查”中。arXiv:200605525,2020),并将结果与​​传统方法进行了比较。带有真实标签的微调方法。已经使用几个深度卷积神经网络对CIFAR100数据集进行了实验。结果表明,通过知识蒸馏回收的修剪网络及其原始网络的精度要优于通过对样本标签进行微调而恢复的修剪网络。本文还证实,作为教师的原始网络比具有与教师相同的准确性的结构不同的网络表现更好。和基于关系的知识(Gou等人,《知识蒸馏:调查》,arXiv:200605525,2020年),并将结果与​​带有大真标签的传统微调方法进行比较。已经使用几个深度卷积神经网络对CIFAR100数据集进行了实验。结果表明,通过知识蒸馏回收的修剪网络及其原始网络的精度要优于通过对样本标签进行微调而恢复的修剪网络。本文还证实,作为教师的原始网络比具有与教师相同的准确性的结构不同的网络表现更好。和基于关系的知识(Gou等人,《知识蒸馏:调查》,arXiv:200605525,2020年),并将结果与​​带有大真标签的传统微调方法进行比较。已经使用几个深度卷积神经网络对CIFAR100数据集进行了实验。结果表明,通过知识蒸馏回收的修剪网络及其原始网络的精度要优于通过对样本标签进行微调而恢复的修剪网络。本文还证实,作为教师的原始网络比具有与教师相同的准确性的结构不同的网络表现更好。已经使用几个深度卷积神经网络对CIFAR100数据集进行了实验。结果表明,通过知识蒸馏回收的修剪网络及其原始网络的精度要优于通过对样本标签进行微调而恢复的修剪网络。本文还证实,作为教师的原始网络比具有与教师相同的准确性的结构不同的网络表现更好。已经使用几个深度卷积神经网络对CIFAR100数据集进行了实验。结果表明,通过知识蒸馏回收的修剪网络及其原始网络的精度要优于通过对样本标签进行微调而恢复的修剪网络。本文还证实,作为教师的原始网络比具有与教师相同的准确性的结构不同的网络表现更好。

更新日期:2021-01-10
down
wechat
bug