Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided Compression,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided Compression
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-07-16 , DOI: arxiv-2107.12445
Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel

Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks, due to their promise to provide increased compute efficiency on event-driven neuromorphic hardware. However, to perform well on complex vision applications, most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy efficiency. Hence,minimizing average spike activity while preserving accuracy indeep SNNs remains a significant challenge and opportunity.This paper presents a non-iterative SNN training technique thatachieves ultra-high compression with reduced spiking activitywhile maintaining high inference accuracy. In particular, our framework first uses the attention-maps of an un compressed meta-model to yield compressed ANNs. This step can be tuned to support both irregular and structured channel pruning to leverage computational benefits over a broad range of platforms. The framework then performs sparse-learning-based supervised SNN training using direct inputs. During the training, it jointly optimizes the SNN weight, threshold, and leak parameters to drastically minimize the number of time steps required while retaining compression. To evaluate the merits of our approach, we performed experiments with variants of VGG and ResNet, on both CIFAR-10 and CIFAR-100, and VGG16 on Tiny-ImageNet.The SNN models generated through the proposed technique yield SOTA compression ratios of up to 33.4x with no significant drops in accuracy compared to baseline unpruned counterparts. Compared to existing SNN pruning methods, we achieve up to 8.3x higher compression with improved accuracy.

中文翻译：

通过注意力引导压缩实现低延迟、高能效的深度 SNN

深度尖峰神经网络 (SNN) 已成为传统深度学习框架的潜在替代方案，因为它们有望在事件驱动的神经形态硬件上提供更高的计算效率。然而，为了在复杂的视觉应用中表现良好，大多数 SNN 训练框架会产生很大的推理延迟，这会导致尖峰活动增加并降低能源效率。因此，在保持深度 SNN 精度的同时最小化平均尖峰活动仍然是一个重大挑战和机遇。本文提出了一种非迭代 SNN 训练技术，该技术在保持高推理精度的同时，在减少尖峰活动的情况下实现超高压缩。特别是，我们的框架首先使用未压缩元模型的注意力图来生成压缩 ANN。可以调整此步骤以支持不规则和结构化通道修剪，以利用广泛平台上的计算优势。然后，该框架使用直接输入执行基于稀疏学习的监督 SNN 训练。在训练过程中，它联合优化 SNN 权重、阈值和泄漏参数，以在保持压缩的同时最大限度地减少所需的时间步数。为了评估我们的方法的优点，我们在 CIFAR-10 和 CIFAR-100 以及 Tiny-ImageNet 上的 VGG16 上对 VGG 和 ResNet 的变体进行了实验。通过所提出的技术生成的 SNN 模型产生高达33.4 倍，与基线未修剪的对应物相比，准确度没有显着下降。与现有的 SNN 剪枝方法相比，我们实现了多达 8 个。

更新日期：2021-07-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文