A 384G Output NonZeros/J Graph Convolutional Neural Network Accelerator,IEEE Transactions on Circuits and Systems II: Express Briefs

当前位置： X-MOL 学术 › IEEE Trans. Circuit Syst. II Express Briefs › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A 384G Output NonZeros/J Graph Convolutional Neural Network Accelerator
IEEE Transactions on Circuits and Systems II: Express Briefs ( IF 4.4 ) Pub Date : 2022-07-04 , DOI: 10.1109/tcsii.2022.3188428
Kyeong-Jun Lee ₁ , Seunghyun Moon ₁ , Jae-Yoon Sim ₂

Affiliation

This brief presents the first IC implementation of graph convolutional neural network (GCN) accelerator chip. A sparsity aware dataflow optimized for sub-block-wise processing of three different matrices in GCN is proposed to improve the utilization ratio of computing resources while reducing the amount of redundant access of off-chip memory. The implemented accelerator in 28-nm CMOS produces 384G NZ outputs/J for the extremely sparse matrix multiplications of the GCN. It shows 58k-to-143k, 38k-to-92k and 5k-to-13k Graph/J for the benchmark graph datasets of Cora, Citeseer and Pubmed, respectively. The energy efficiency in Graph/J of the proposed 16b ASIC implementation shows about 4-to-

$11\mathbf {\times }$

and 8-to-

$25\mathbf {\times }$

improvements compared to the previously reported 8b FPGA and 32b FPGA implementations, respectively.

中文翻译：

一个 384G 输出 NonZeros/J 图卷积神经网络加速器

本简介介绍了图卷积神经网络 (GCN) 加速器芯片的第一个 IC 实现。为了提高计算资源的利用率，同时减少片外内存的冗余访问量，提出了一种针对 GCN 中三个不同矩阵的子块处理进行优化的稀疏感知数据流。在 28-nm CMOS 中实现的加速器为 GCN 的极稀疏矩阵乘法产生 384G NZ 输出/J。它分别显示了 Cora、Citeseer 和 Pubmed 的基准图数据集的 58k 到 143k、38k 到 92k 和 5k 到 13k Graph/J。提议的 16b ASIC 实现的 Graph/J 中的能效显示大约 4 到 -

$11\mathbf {\times }$

和 8 到 -

$25\mathbf {\times }$

分别与之前报道的 8b FPGA 和 32b FPGA 实现相比有所改进。

更新日期：2022-07-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文