当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FEECA: Design Space Exploration for Low-Latency and Energy-Efficient Capsule Network Accelerators
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2021-02-25 , DOI: 10.1109/tvlsi.2021.3059518
Alberto Marchisio , Vojtech Mrazek , Muhammad Abdullah Hanif , Muhammad Shafique

In the past few years, Capsule Networks (CapsNets) have taken the spotlight compared to traditional convolutional neural networks (CNNs) for image classification. Unlike CNNs, CapsNets have the ability to learn the spatial relationship between features of the images. However, their complexity grows because of their heterogeneous capsule structure and the dynamic routing, which is an iterative algorithm to dynamically learn the coupling coefficients of two consecutive capsule layers. This necessitates specialized hardware accelerators for CapsNets. Moreover, a high-performance and energy-efficient design of CapsNet accelerators requires exploration of different design decisions (such as the size and configuration of the processing array and the structure of the processing elements). Toward this, we make the following key contributions: 1) FEECA, a novel methodology to explore the design space of the (micro)architectural parameters of a CapsNet hardware accelerator and 2) CapsAcc, the first specialized RTL-level hardware architecture to perform CapsNets inference with high performance and high energy efficiency. Our CapsAcc achieves significant performance improvement, compared to an optimized GPU implementation, due to its efficient implementation of key activation functions, such as squash and softmax, and an efficient data reuse for the dynamic routing. The FEECA methodology employs the Non-dominated Sorting Genetic Algorithm (NSGA-II) to explore the Pareto-optimal points with respect to area, performance, and energy consumption. This requires analytical modeling of the number of clock cycles required to perform each operation of the CapsNet inference and the memory accesses to enable a fast yet accurate design space exploration. We synthesized the complete accelerator architecture in a 45-nm CMOS technology using Synopsys design tools and evaluated it for the MNIST benchmark (as done by the original CapsNet paper from Google Brain's team) and for a more complex data set, the German Traffic Sign Recognition Benchmark (GTSRB).

中文翻译:


FEECA:低延迟和节能胶囊网络加速器的设计空间探索



在过去的几年里,与传统的卷积神经网络(CNN)相比,胶囊网络(CapsNets)在图像分类方面备受关注。与 CNN 不同,CapsNet 能够学习图像特征之间的空间关系。然而,由于其异构胶囊结构和动态路由(动态学习两个连续胶囊层的耦合系数的迭代算法),它们的复杂性增加了。这需要 CapsNet 的专用硬件加速器。此外,CapsNet 加速器的高性能和节能设计需要探索不同的设计决策(例如处理阵列的大小和配置以及处理元件的结构)。为此,我们做出了以下关键贡献:1) FEECA,一种探索 CapsNet 硬件加速器(微)架构参数设计空间的新颖方法;2) CapsAcc,第一个执行 CapsNet 的专用 RTL 级硬件架构具有高性能和高能效的推理。与优化的 GPU 实现相比,我们的 CapsAcc 实现了显着的性能提升,因为它有效地实现了关键激活函数(例如,squash 和 softmax),以及动态路由的高效数据重用。 FEECA 方法采用非支配排序遗传算法 (NSGA-II) 来探索面积、性能和能耗方面的帕累托最优点。这需要对执行 CapsNet 推理和内存访问的每个操作所需的时钟周期数进行分析建模,以实现快速而准确的设计空间探索。 我们使用 Synopsys 设计工具在 45 nm CMOS 技术中综合了完整的加速器架构,并针对 MNIST 基准(由 Google Brain 团队的原始 CapsNet 论文完成)和更复杂的数据集(德国交通标志识别)对其进行了评估基准(GTSRB)。
更新日期:2021-02-25
down
wechat
bug