Improving knowledge distillation using unified ensembles of specialized teachers,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving knowledge distillation using unified ensembles of specialized teachers
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-03-21 , DOI: 10.1016/j.patrec.2021.03.014
Adamantios Zaras , Nikolaos Passalis , Anastasios Tefas

The increasing complexity of deep learning models led to the development of Knowledge Distillation (KD) approaches that enable us to transfer the knowledge between a very large network, called teacher and a smaller and faster one, called student. However, as recent evidence suggests, using powerful teachers often negatively impacts the effectiveness of the distillation process. In this paper, the reasons behind this apparent limitation are studied and an approach that transfers the knowledge to smaller models more efficiently is proposed. To this end, multiple highly specialized teachers are employed, each one for a small set of skills, overcoming the aforementioned limitation, while also achieving high distillation efficiency by diversifying the ensemble. At the same time, the employed ensemble is formulated in a unified structure, making it possible to simultaneously train multiple models. The effectiveness of the proposed method is demonstrated using three different image datasets, leading to improved distillation performance, even when compared with powerful state-of-the-art ensemble-based distillation methods.

中文翻译：

使用专业教师的统一合奏来改善知识提炼

深度学习模型的日益复杂性导致了知识蒸馏（KD）方法的发展，该方法使我们能够在称为教师的超大型网络与称为学生的较小且较快的网络之间传递知识。但是，正如最近的证据所表明的那样，使用强大的教师通常会对蒸馏过程的有效性产生负面影响。在本文中，研究了这种明显局限性的原因，并提出了一种将知识更有效地转移到较小模型的方法。为此，聘用了多名高度专业化的教师，每个教师都有一小组技能，克服了上述限制，同时还通过使合奏多样化而实现了较高的蒸馏效率。同时，所雇用的合奏以统一的结构表述，使同时训练多个模型成为可能。即使与强大的基于集合的强大蒸馏方法相比，使用三个不同的图像数据集也证明了所提方法的有效性，从而提高了蒸馏性能。

更新日期：2021-04-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11