当前位置:
X-MOL 学术
›
arXiv.cs.NE
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided Compression
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-07-16 , DOI: arxiv-2107.12445 Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-07-16 , DOI: arxiv-2107.12445 Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel
Deep spiking neural networks (SNNs) have emerged as a potential alternative
to traditional deep learning frameworks, due to their promise to provide
increased compute efficiency on event-driven neuromorphic hardware. However, to
perform well on complex vision applications, most SNN training frameworks yield
large inference latency which translates to increased spike activity and
reduced energy efficiency. Hence,minimizing average spike activity while
preserving accuracy indeep SNNs remains a significant challenge and
opportunity.This paper presents a non-iterative SNN training technique
thatachieves ultra-high compression with reduced spiking activitywhile
maintaining high inference accuracy. In particular, our framework first uses
the attention-maps of an un compressed meta-model to yield compressed ANNs.
This step can be tuned to support both irregular and structured channel pruning
to leverage computational benefits over a broad range of platforms. The
framework then performs sparse-learning-based supervised SNN training using
direct inputs. During the training, it jointly optimizes the SNN weight,
threshold, and leak parameters to drastically minimize the number of time steps
required while retaining compression. To evaluate the merits of our approach,
we performed experiments with variants of VGG and ResNet, on both CIFAR-10 and
CIFAR-100, and VGG16 on Tiny-ImageNet.The SNN models generated through the
proposed technique yield SOTA compression ratios of up to 33.4x with no
significant drops in accuracy compared to baseline unpruned counterparts.
Compared to existing SNN pruning methods, we achieve up to 8.3x higher
compression with improved accuracy.
中文翻译:
通过注意力引导压缩实现低延迟、高能效的深度 SNN
深度尖峰神经网络 (SNN) 已成为传统深度学习框架的潜在替代方案,因为它们有望在事件驱动的神经形态硬件上提供更高的计算效率。然而,为了在复杂的视觉应用中表现良好,大多数 SNN 训练框架会产生很大的推理延迟,这会导致尖峰活动增加并降低能源效率。因此,在保持深度 SNN 精度的同时最小化平均尖峰活动仍然是一个重大挑战和机遇。本文提出了一种非迭代 SNN 训练技术,该技术在保持高推理精度的同时,在减少尖峰活动的情况下实现超高压缩。特别是,我们的框架首先使用未压缩元模型的注意力图来生成压缩 ANN。可以调整此步骤以支持不规则和结构化通道修剪,以利用广泛平台上的计算优势。然后,该框架使用直接输入执行基于稀疏学习的监督 SNN 训练。在训练过程中,它联合优化 SNN 权重、阈值和泄漏参数,以在保持压缩的同时最大限度地减少所需的时间步数。为了评估我们的方法的优点,我们在 CIFAR-10 和 CIFAR-100 以及 Tiny-ImageNet 上的 VGG16 上对 VGG 和 ResNet 的变体进行了实验。通过所提出的技术生成的 SNN 模型产生高达33.4 倍,与基线未修剪的对应物相比,准确度没有显着下降。与现有的 SNN 剪枝方法相比,我们实现了多达 8 个。
更新日期:2021-07-28
中文翻译:
通过注意力引导压缩实现低延迟、高能效的深度 SNN
深度尖峰神经网络 (SNN) 已成为传统深度学习框架的潜在替代方案,因为它们有望在事件驱动的神经形态硬件上提供更高的计算效率。然而,为了在复杂的视觉应用中表现良好,大多数 SNN 训练框架会产生很大的推理延迟,这会导致尖峰活动增加并降低能源效率。因此,在保持深度 SNN 精度的同时最小化平均尖峰活动仍然是一个重大挑战和机遇。本文提出了一种非迭代 SNN 训练技术,该技术在保持高推理精度的同时,在减少尖峰活动的情况下实现超高压缩。特别是,我们的框架首先使用未压缩元模型的注意力图来生成压缩 ANN。可以调整此步骤以支持不规则和结构化通道修剪,以利用广泛平台上的计算优势。然后,该框架使用直接输入执行基于稀疏学习的监督 SNN 训练。在训练过程中,它联合优化 SNN 权重、阈值和泄漏参数,以在保持压缩的同时最大限度地减少所需的时间步数。为了评估我们的方法的优点,我们在 CIFAR-10 和 CIFAR-100 以及 Tiny-ImageNet 上的 VGG16 上对 VGG 和 ResNet 的变体进行了实验。通过所提出的技术生成的 SNN 模型产生高达33.4 倍,与基线未修剪的对应物相比,准确度没有显着下降。与现有的 SNN 剪枝方法相比,我们实现了多达 8 个。