Teacher-Class Network: A Neural Network Compression Mechanism,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Teacher-Class Network: A Neural Network Compression Mechanism
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-04-07 , DOI: arxiv-2004.03281
Shaiq Munir Malik, Mohbat Tharani, and Murtaza Taj

To solve the problem of the overwhelming size of Deep Neural Networks (DNN) several compression schemes have been proposed, one of them is teacher-student. Teacher-student tries to transfer knowledge from a complex teacher network to a simple student network. In this paper, we propose a novel method called a teacher-class network consisting of a single teacher and multiple student networks (i.e. class of students). Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge about the entire solution to each student. Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network. Thus unlike the logits-based single student approach, the combined knowledge learned by the class of students can be used to solve other problems as well. These students can be designed to satisfy a given budget, e.g. for comparative purposes we kept the collective parameters of all the students less than or equivalent to that of a single student in the teacher-student approach . These small student networks are trained independently, making it possible to train and deploy models on memory deficient devices as well as on parallel processing systems such as data centers. The proposed teacher-class architecture is evaluated on several benchmark datasets including MNIST, FashionMNIST, IMDB Movie Reviews and CAMVid on multiple tasks including classification, sentiment classification and segmentation. Our approach outperforms the state-of-the-art single student approach in terms of accuracy as well as computational cost and in many cases it achieves an accuracy equivalent to the teacher network while having 10-30 times fewer parameters.

中文翻译：

教师课堂网络：一种神经网络压缩机制

为了解决深度神经网络 (DNN) 过大的问题，已经提出了几种压缩方案，其中之一是师生。师生试图将知识从复杂的教师网络转移到简单的学生网络。在本文中，我们提出了一种称为教师级网络的新方法，该方法由单个教师和多个学生网络（即学生班级）组成。所提出的方法不是仅将知识传递给一个学生，而是将有关整个解决方案的大量知识传递给每个学生。我们的学生没有接受针对特定问题的 logits 的训练，他们接受了模仿教师网络学习的知识（密集表示）的训练。因此，与基于 logits 的单个学生方法不同，该班学生所学的综合知识也可用于解决其他问题。可以设计这些学生以满足给定的预算，例如，为了比较目的，我们将所有学生的集体参数保持在师生方法中小于或等于单个学生的集体参数。这些小型学生网络是独立训练的，因此可以在内存不足的设备以及数据中心等并行处理系统上训练和部署模型。建议的教师级架构在多个基准数据集（包括 MNIST、FashionMNIST、IMDB 电影评论和 CAMVid）上在多个任务（包括分类、情感分类和分割）上进行评估。

更新日期：2020-05-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文