当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2022-03-09 , DOI: arxiv-2203.04570
Zhuoran Song, Yihong Xu, Zhezhi He, Li Jiang, Naifeng Jing, Xiaoyao Liang

Vision transformer (ViT) has achieved competitive accuracy on a variety of computer vision applications, but its computational cost impedes the deployment on resource-limited mobile devices. We explore the sparsity in ViT and observe that informative patches and heads are sufficient for accurate image recognition. In this paper, we propose a cascade pruning framework named CP-ViT by predicting sparsity in ViT models progressively and dynamically to reduce computational redundancy while minimizing the accuracy loss. Specifically, we define the cumulative score to reserve the informative patches and heads across the ViT model for better accuracy. We also propose the dynamic pruning ratio adjustment technique based on layer-aware attention range. CP-ViT has great general applicability for practical deployment, which can be applied to a wide range of ViT models and can achieve superior accuracy with or without fine-tuning. Extensive experiments on ImageNet, CIFAR-10, and CIFAR-100 with various pre-trained models have demonstrated the effectiveness and efficiency of CP-ViT. By progressively pruning 50\% patches, our CP-ViT method reduces over 40\% FLOPs while maintaining accuracy loss within 1\%.

中文翻译:

CP-ViT:通过渐进式稀疏预测进行级联视觉变压器修剪

视觉转换器 (ViT) 在各种计算机视觉应用中实现了具有竞争力的精度,但其计算成本阻碍了在资源有限的移动设备上的部署。我们探索了 ViT 中的稀疏性,并观察到信息丰富的补丁和头部足以进行准确的图像识别。在本文中,我们提出了一个名为 CP-ViT 的级联修剪框架,通过逐步动态地预测 ViT 模型中的稀疏性来减少计算冗余,同时最大限度地减少精度损失。具体来说,我们定义累积分数以保留 ViT 模型中的信息补丁和头部,以获得更好的准确性。我们还提出了基于层感知注意力范围的动态修剪比率调整技术。CP-ViT 对实际部署具有很好的普遍适用性,它可以应用于范围广泛的 ViT 模型,无论是否进行微调,都可以实现卓越的精度。使用各种预训练模型在 ImageNet、CIFAR-10 和 CIFAR-100 上进行的大量实验证明了 CP-ViT 的有效性和效率。通过逐步修剪 50% 的补丁,我们的 CP-ViT 方法减少了超过 40% 的 FLOP,同时将精度损失保持在 1% 以内。
更新日期:2022-03-09
down
wechat
bug