A compression pipeline for one-stage object detection model,Journal of Real-Time Image Processing

当前位置： X-MOL 学术 › J. Real-Time Image Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A compression pipeline for one-stage object detection model
Journal of Real-Time Image Processing ( IF 2.9 ) Pub Date : 2021-01-21 , DOI: 10.1007/s11554-020-01053-z
Zhishan Li , Yiran Sun , Guanzhong Tian , Lei Xie , Yong Liu , Hongye Su , Yifan He

Deep neural networks (DNNs) have strong fitting ability on a variety of computer vision tasks, but they also require intensive computing power and large storage space, which are not always available in portable smart devices. Although a lot of studies have contributed to the compression of image classification networks, there are few model compression algorithms for object detection models. In this paper, we propose a general compression pipeline for one-stage object detection networks to meet the real-time requirements. Firstly, we propose a softer pruning strategy on the backbone to reduce the number of filters. Compared with original direct pruning, our method can maintain the integrity of network structure and reduce the drop of accuracy. Secondly, we transfer the knowledge of the original model to the small model by knowledge distillation to reduce the accuracy drop caused by pruning. Finally, as edge devices are more suitable for integer operations, we further transform the 32-bit floating point model into the 8-bit integer model through quantization. With this pipeline, the model size and inference time are compressed to 10% or less of the original, while the mAP is only reduced by 2.5% or less. We verified that performance of the compression pipeline on the Pascal VOC dataset.

中文翻译：

用于一级目标检测模型的压缩管线

深度神经网络（DNN）具有很强的适应各种计算机视觉任务的能力，但它们也需要密集的计算能力和巨大的存储空间，而便携式智能设备并不总是具备这种能力。尽管许多研究为图像分类网络的压缩做出了贡献，但是用于对象检测模型的模型压缩算法却很少。在本文中，我们提出了一种用于一级目标检测网络的通用压缩管道，以满足实时性要求。首先，我们在主干上提出了一种较软的修剪策略，以减少过滤器的数量。与原始的直接修剪相比，我们的方法可以保持网络结构的完整性并减少准确性下降。其次，我们通过知识蒸馏将原始模型的知识转移到小模型，以减少因修剪而导致的准确性下降。最后，由于边缘设备更适合整数运算，因此我们通过量化将32位浮点模型进一步转换为8位整数模型。通过该管道，模型大小和推理时间被压缩为原始数据的10％或更少，而mAP仅减少了2.5％或更少。我们验证了Pascal VOC数据集上压缩管道的性能。而mAP仅降低2.5％或更少。我们验证了Pascal VOC数据集上压缩管道的性能。而mAP仅降低2.5％或更少。我们验证了Pascal VOC数据集上压缩管道的性能。

更新日期：2021-01-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11