Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2021-04-14 , DOI: 10.1109/tc.2021.3073255
Martino Dazzi , Abu Sebastian , Thomas Parnell , Pier Andrea Francese , Luca Benini , Evangelos Eleftheriou

In-memory computing is an emerging computing paradigm enabling deep-learning inference at significantly higher energy-efficiency and reduced latency. The essential idea is mapping the synaptic weights of each layer to one or more in-memory computing (IMC) cores. During inference, these cores perform the associated matrix-vector multiplications in place with O(1) time complexity, obviating the need to move the synaptic weights to additional processing units. Moreover, this architecture enables the execution of these networks in a highly pipelined fashion. However, a key challenge is designing an efficient communication fabric for the IMC cores. In this work, we present one such communication fabric based on a graph topology that is well-suited for the widely successful convolutional neural networks (CNNs). We show that this communication fabric facilitates the pipelined execution of all state-of-the-art CNNs by proving the existence of a homomorphism between the graph representations of these networks and that corresponding to the proposed communication fabric. We then present a quantitative comparison with established communication topologies and show that our proposed topology achieves the lowest bandwidth requirements per communication channel. Finally, we present one hardware implementation and show a concrete example of mapping ResNet-32 onto an IMC core array interconnected via the proposed communication fabric.

中文翻译：

基于内存中计算和图同态验证的CNN高效流水线执行

内存中计算是一种新兴的计算范例，能够以更高的能源效率和更少的等待时间进行深度学习推理。基本思想是将每一层的突触权重映射到一个或多个内存计算（IMC）内核。在推理过程中，这些核心以O（1）的时间复杂度执行相关的矩阵向量乘法，从而避免了将突触权重移动到其他处理单元的需要。此外，这种体系结构能够以高度流水线的方式执行这些网络。但是，关键的挑战是为IMC内核设计有效的通信结构。在这项工作中，我们提出了一种基于图拓扑的通信结构，非常适合广泛成功的卷积神经网络（CNN）。我们证明了这种通信结构通过证明这些网络的图形表示与对应于所建议的通信结构的同构性之间存在同构关系，有助于所有最新的CNN的流水线执行。然后，我们与已建立的通信拓扑结构进行定量比较，并表明我们提出的拓扑结构实现了每个通信通道的最低带宽要求。最后，我们介绍一种硬件实现，并显示将ResNet-32映射到通过建议的通信结构互连的IMC核心阵列上的具体示例。然后，我们提出了与已建立的通信拓扑的定量比较，并表明我们提出的拓扑实现了每个通信通道的最低带宽要求。最后，我们介绍一种硬件实现，并显示将ResNet-32映射到通过建议的通信结构互连的IMC核心阵列上的具体示例。然后，我们与已建立的通信拓扑结构进行定量比较，并表明我们提出的拓扑结构实现了每个通信通道的最低带宽要求。最后，我们介绍一种硬件实现，并显示将ResNet-32映射到通过建议的通信结构互连的IMC核心阵列上的具体示例。

更新日期：2021-05-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>