当前位置: X-MOL 学术IEEE Trans. Circuit Syst. II Express Briefs › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A 384G Output NonZeros/J Graph Convolutional Neural Network Accelerator
IEEE Transactions on Circuits and Systems II: Express Briefs ( IF 4.4 ) Pub Date : 2022-07-04 , DOI: 10.1109/tcsii.2022.3188428
Kyeong-Jun Lee 1 , Seunghyun Moon 1 , Jae-Yoon Sim 2
Affiliation  

This brief presents the first IC implementation of graph convolutional neural network (GCN) accelerator chip. A sparsity aware dataflow optimized for sub-block-wise processing of three different matrices in GCN is proposed to improve the utilization ratio of computing resources while reducing the amount of redundant access of off-chip memory. The implemented accelerator in 28-nm CMOS produces 384G NZ outputs/J for the extremely sparse matrix multiplications of the GCN. It shows 58k-to-143k, 38k-to-92k and 5k-to-13k Graph/J for the benchmark graph datasets of Cora, Citeseer and Pubmed, respectively. The energy efficiency in Graph/J of the proposed 16b ASIC implementation shows about 4-to- $11\mathbf {\times }$ and 8-to- $25\mathbf {\times }$ improvements compared to the previously reported 8b FPGA and 32b FPGA implementations, respectively.

中文翻译:

一个 384G 输出 NonZeros/J 图卷积神经网络加速器

本简介介绍了图卷积神经网络 (GCN) 加速器芯片的第一个 IC 实现。为了提高计算资源的利用率,同时减少片外内存的冗余访问量,提出了一种针对 GCN 中三个不同矩阵的子块处理进行优化的稀疏感知数据流。在 28-nm CMOS 中实现的加速器为 GCN 的极稀疏矩阵乘法产生 384G NZ 输出/J。它分别显示了 Cora、Citeseer 和 Pubmed 的基准图数据集的 58k 到 143k、38k 到 92k 和 5k 到 13k Graph/J。提议的 16b ASIC 实现的 Graph/J 中的能效显示大约 4 到 - $11\mathbf {\times }$和 8 到 - $25\mathbf {\times }$分别与之前报道的 8b FPGA 和 32b FPGA 实现相比有所改进。
更新日期:2022-07-04
down
wechat
bug