Highlight Every Step: Knowledge Distillation via Collaborative Teaching,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Highlight Every Step: Knowledge Distillation via Collaborative Teaching
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 7-28-2020 , DOI: 10.1109/tcyb.2020.3007506
Haoran Zhao ₁ , Xin Sun ₁ , Junyu Dong ₁ , Changrui Chen ₁ , Zihe Dong ₁

Affiliation

High storage and computational costs obstruct deep neural networks to be deployed on resource-constrained devices. Knowledge distillation (KD) aims to train a compact student network by transferring knowledge from a larger pretrained teacher model. However, most existing methods on KD ignore the valuable information among the training process associated with training results. In this article, we provide a new collaborative teaching KD (CTKD) strategy which employs two special teachers. Specifically, one teacher trained from scratch (i.e., scratch teacher) assists the student step by step using its temporary outputs. It forces the student to approach the optimal path toward the final logits with high accuracy. The other pretrained teacher (i.e., expert teacher) guides the student to focus on a critical region that is more useful for the task. The combination of the knowledge from two special teachers can significantly improve the performance of the student network in KD. The results of experiments on CIFAR-10, CIFAR-100, SVHN, Tiny ImageNet, and ImageNet datasets verify that the proposed KD method is efficient and achieves state-of-the-art performance.

中文翻译：

突出每一步：协作教学知识提炼

高存储和计算成本阻碍了深度神经网络在资源受限的设备上的部署。知识蒸馏（KD）旨在通过从更大的预训练教师模型转移知识来训练紧凑的学生网络。然而，大多数现有的 KD 方法忽略了训练过程中与训练结果相关的有价值的信息。在本文中，我们提供了一种新的协作教学 KD（CTKD）策略，该策略聘请了两名特殊教师。具体来说，一名从头开始训练的教师（即，从头教师）使用其临时输出逐步协助学生。它迫使学生以高精度接近最终逻辑的最佳路径。另一位经过预培训的教师（即专家教师）引导学生关注对任务更有用的关键区域。两位特殊教师的知识结合可以显着提高学生网络在 KD 中的表现。在 CIFAR-10、CIFAR-100、SVHN、Tiny ImageNet 和 ImageNet 数据集上的实验结果验证了所提出的 KD 方法是高效的并实现了最先进的性能。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11