当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GenExp: Multi-objective pruning for deep neural network based on genetic algorithm
Neurocomputing ( IF 5.5 ) Pub Date : 2021-04-13 , DOI: 10.1016/j.neucom.2021.04.022
Ke Xu , Dezheng Zhang , Jianjing An , Li Liu , Lingzhi Liu , Dong Wang

Unstructured deep neural network (DNN) pruning have been widely studied. However, previous schemes only focused upon compressing the model’s memory footprint, which had led to relatively low reduction ratio in computational workload. This study demonstrates that the main reason behind is the inconsistent distribution of memory footprint and workload of the DNN model among different layers. Based on this observation, we propose to map the network pruning flow as a multi-objective optimization problem and design an improved genetic algorithm, which can efficiently explore the whole pruning structure space with both pruning goals equally constrained, to find the suitable solution that strikes a judicious balance between the DNN’s model size and workload. Experiments show that the proposed scheme can achieve up to 34% further reduction on the model’s computational workload compared to the state-of-the-art pruning scheme [11], [33] for ResNet50 on the ILSVRC-2012 dataset. We have also deployed the pruned ResNet50 models on a dedicated DNN accelerator, and the measured data have shown a considerable 6× reduction in inference time compared to FPGA accelerator implementing dense CNN model quantized in INT8 format, and a 2.27× improvement in power efficiency over 2080Ti GPU-based implementations, respectively.



中文翻译:

GenExp:基于遗传算法的深度神经网络多目标修剪

非结构化深度神经网络(DNN)修剪已得到广泛研究。但是,以前的方案仅专注于压缩模型的内存占用量,这导致计算工作量的减少率相对较低。这项研究表明,背后的主要原因是DNN模型在不同层之间的内存占用量和工作负载分配不一致。基于此观察,我们建议将网络修剪流映射为一个多目标优化问题,并设计一种改进的遗传算法,该算法可以在两个修剪目标均等受约束的情况下,有效地探索整个修剪结构空间,以找到合适的解决方案DNN的模型大小和工作量之间的明智平衡。实验表明,该方案可以达到34与ILSVRC-2012数据集上的ResNet50的最新修剪方案[11],[33]相比,可进一步减少模型的计算工作量。我们还将修剪后的ResNet50模型部署在专用的DNN加速器上,实测数据表明,6× 与实施以INT8格式量化的密集CNN模型的FPGA加速器相比,减少了推理时间 2.27× 分别比基于2080Ti GPU的实现方式的电源效率有所提高。

更新日期:2021-05-07
down
wechat
bug