当前位置:
X-MOL 学术
›
arXiv.cs.AR
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications
arXiv - CS - Hardware Architecture Pub Date : 2020-05-19 , DOI: arxiv-2005.09526 Abhash Kumar, Jawar Singh, Sai Manohar Beeraka, and Bharat Gupta
arXiv - CS - Hardware Architecture Pub Date : 2020-05-19 , DOI: arxiv-2005.09526 Abhash Kumar, Jawar Singh, Sai Manohar Beeraka, and Bharat Gupta
Traditional von Neumann architecture based processors become inefficient in
terms of energy and throughput as they involve separate processing and memory
units, also known as~\textit{memory wall}. The memory wall problem is further
exacerbated when massive parallelism and frequent data movement are required
between processing and memory units for real-time implementation of artificial
neural network (ANN) that enables many intelligent applications. One of the
most promising approach to address the memory wall problem is to carry out
computations inside the memory core itself that enhances the memory bandwidth
and energy efficiency for extensive computations. This paper presents an
in-memory computing architecture for ANN enabling artificial intelligence (AI)
and machine learning (ML) applications. The proposed architecture utilizes deep
in-memory architecture based on standard six transistor (6T) static random
access memory (SRAM) core for the implementation of a multi-layered perceptron.
Our novel on-chip training and inference in-memory architecture reduces energy
cost and enhances throughput by simultaneously accessing the multiple rows of
SRAM array per precharge cycle and eliminating the frequent access of data. The
proposed architecture realizes backpropagation which is the keystone during the
network training using newly proposed different building blocks such as weight
updation, analog multiplication, error calculation, signed analog to digital
conversion, and other necessary signal control units. The proposed architecture
was trained and tested on the IRIS dataset which exhibits $\approx46\times$
more energy efficient per MAC (multiply and accumulate) operation compared to
earlier classifiers.
中文翻译:
用于 AI/ML 应用的片上可训练和可扩展 ANN 的内存实现
传统的基于冯诺依曼架构的处理器在能量和吞吐量方面变得低效,因为它们涉及单独的处理和内存单元,也称为~\textit{内存墙}。当需要在处理单元和内存单元之间进行大规模并行和频繁数据移动以实时实现支持许多智能应用的人工神经网络 (ANN) 时,内存墙问题会进一步加剧。解决内存墙问题最有前途的方法之一是在内存核心本身内部进行计算,从而提高内存带宽和能源效率,以进行广泛的计算。本文介绍了一种用于 ANN 的内存计算架构,支持人工智能 (AI) 和机器学习 (ML) 应用程序。所提出的架构利用基于标准六晶体管 (6T) 静态随机存取存储器 (SRAM) 核心的深度内存架构来实现多层感知器。我们新颖的片上训练和推理内存架构通过在每个预充电周期同时访问多行 SRAM 阵列并消除数据的频繁访问来降低能源成本并提高吞吐量。所提出的架构使用新提出的不同构建块(例如权重更新、模拟乘法、误差计算、有符号模数转换和其他必要的信号控制单元)来实现反向传播,这是网络训练过程中的基石。
更新日期:2020-05-20
中文翻译:
用于 AI/ML 应用的片上可训练和可扩展 ANN 的内存实现
传统的基于冯诺依曼架构的处理器在能量和吞吐量方面变得低效,因为它们涉及单独的处理和内存单元,也称为~\textit{内存墙}。当需要在处理单元和内存单元之间进行大规模并行和频繁数据移动以实时实现支持许多智能应用的人工神经网络 (ANN) 时,内存墙问题会进一步加剧。解决内存墙问题最有前途的方法之一是在内存核心本身内部进行计算,从而提高内存带宽和能源效率,以进行广泛的计算。本文介绍了一种用于 ANN 的内存计算架构,支持人工智能 (AI) 和机器学习 (ML) 应用程序。所提出的架构利用基于标准六晶体管 (6T) 静态随机存取存储器 (SRAM) 核心的深度内存架构来实现多层感知器。我们新颖的片上训练和推理内存架构通过在每个预充电周期同时访问多行 SRAM 阵列并消除数据的频繁访问来降低能源成本并提高吞吐量。所提出的架构使用新提出的不同构建块(例如权重更新、模拟乘法、误差计算、有符号模数转换和其他必要的信号控制单元)来实现反向传播,这是网络训练过程中的基石。