当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications
arXiv - CS - Hardware Architecture Pub Date : 2020-05-19 , DOI: arxiv-2005.09526
Abhash Kumar, Jawar Singh, Sai Manohar Beeraka, and Bharat Gupta

Traditional von Neumann architecture based processors become inefficient in terms of energy and throughput as they involve separate processing and memory units, also known as~\textit{memory wall}. The memory wall problem is further exacerbated when massive parallelism and frequent data movement are required between processing and memory units for real-time implementation of artificial neural network (ANN) that enables many intelligent applications. One of the most promising approach to address the memory wall problem is to carry out computations inside the memory core itself that enhances the memory bandwidth and energy efficiency for extensive computations. This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. The proposed architecture utilizes deep in-memory architecture based on standard six transistor (6T) static random access memory (SRAM) core for the implementation of a multi-layered perceptron. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of SRAM array per precharge cycle and eliminating the frequent access of data. The proposed architecture realizes backpropagation which is the keystone during the network training using newly proposed different building blocks such as weight updation, analog multiplication, error calculation, signed analog to digital conversion, and other necessary signal control units. The proposed architecture was trained and tested on the IRIS dataset which exhibits $\approx46\times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.

中文翻译:

用于 AI/ML 应用的片上可训练和可扩展 ANN 的内存实现

传统的基于冯诺依曼架构的处理器在能量和吞吐量方面变得低效,因为它们涉及单独的处理和内存单元,也称为~\textit{内存墙}。当需要在处理单元和内存单元之间进行大规模并行和频繁数据移动以实时实现支持许多智能应用的人工神经网络 (ANN) 时,内存墙问题会进一步加剧。解决内存墙问题最有前途的方法之一是在内存核心本身内部进行计算,从而提高内存带宽和能源效率,以进行广泛的计算。本文介绍了一种用于 ANN 的内存计算架构,支持人工智能 (AI) 和机器学习 (ML) 应用程序。所提出的架构利用基于标准六晶体管 (6T) 静态随机存取存储器 (SRAM) 核心的深度内存架构来实现多层感知器。我们新颖的片上训练和推理内存架构通过在每个预充电周期同时访问多行 SRAM 阵列并消除数据的频繁访问来降低能源成本并提高吞吐量。所提出的架构使用新提出的不同构建块(例如权重更新、模拟乘法、误差计算、有符号模数转换和其他必要的信号控制单元)来实现反向传播,这是网络训练过程中的基石。
更新日期:2020-05-20
down
wechat
bug