Improving knowledge distillation via an expressive teacher,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving knowledge distillation via an expressive teacher
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2021-02-12 , DOI: 10.1016/j.knosys.2021.106837
Chao Tan , Jie Liu , Xiang Zhang

Knowledge distillation (KD) is a widely used network compression technique for seeking a light student network with similar behaviors to its heavy teacher network. Previous studies mainly focus on training the student to mimic representation space of the teacher. However, how to be a good teacher is rarely explored. We find that if a teacher has weak ability to capture the knowledge underlying the true data in the real world, the student cannot even learn knowledge from its teacher. Inspired by that, we propose an inter-class correlation regularization to train teacher to capture a more explicit correlation among classes. Besides, we enforce student to mimic inter-class correlation of its teacher. Extensive experiments of image classification task have been conducted on four public benchmarks. For example, when the teacher and student networks are ShuffleNetV2-1.0 and ShuffleNetV2-0.5, our proposed method achieves 42.63% top-1 error rate for Tiny ImageNet.

中文翻译：

通过富有表现力的老师提高知识素养

知识蒸馏（KD）是一种广泛使用的网络压缩技术，用于寻找行为与重型教师网络相似的轻型学生网络。以往的研究主要集中在训练学生模仿老师的表征空间。但是，很少探索如何成为一名好老师。我们发现，如果教师在获取真实世界中真实数据基础知识的能力较弱时，学生甚至无法向其教师学习知识。受此启发，我们提出了一种班际间相关性正则化方法，以训练教师捕捉班级之间更明确的相关性。此外，我们强迫学生模仿其老师的班际关系。图像分类任务的广泛实验已在四个公共基准上进行。例如，

更新日期：2021-02-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>