当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Systolic Accelerator for Neuromorphic Visual Recognition
Electronics ( IF 2.9 ) Pub Date : 2020-10-15 , DOI: 10.3390/electronics9101690
Shuo Tian , Lei Wang , Shi Xu , Shasha Guo , Zhijie Yang , Jianfeng Zhang , Weixia Xu

Advances in neuroscience have encouraged researchers to focus on developing computational models that behave like the human brain. HMAX is one of the potential biologically inspired models that mimic the primate visual cortex’s functions and structures. HMAX has shown its effectiveness and versatility in multi-class object recognition with a simple computational
structure. It is still a challenge to implement the HMAX model in embedded systems due to the heaviest computational S2 phase of HMAX. Previous implementations such as CoRe16 have used a reconfigurable two-dimensional processing element (PE) array to speed up the S2 layer for HMAX. However, the adder tree mechanism in CoRe16 used to produce output pixels by accumulating partial sums in different PEs increases the runtime for HMAX. To speed up the execution process of the S2 layer in HMAX, in this paper, we propose SAFA (systolic accelerator for HMAX), a systolic-array based architecture to compute and accelerate the S2 stage of HMAX. Using the output stationary (OS) dataflow, each PE in SAFA not only calculates the output pixel independently without additional accumulation of partial sums in multiple PEs, but also reduces the multiplexers applied in
reconfigurable accelerators. Besides, data forwarding for the same input or weight data in OS reduces the memory bandwidth requirements. The simulation results show that the runtime of the heaviest computational S2 stage in HMAX model is decreased by 5.7%, and the bandwidth required for
memory is reduced by 3.53× on average by different kernel sizes (except for kernel = 12) compared with CoRe16. SAFA also obtains lower power and area costs than other reconfigurable accelerators from synthesis on ASIC.



中文翻译:

用于神经形态视觉识别的脉动加速器

神经科学的进步鼓励研究人员专注于开发行为类似于人脑的计算模型。HMAX是模仿灵长类动物视觉皮层功能和结构的潜在生物启发模型之一。HMAX通过简单的计算就证明了其在多类目标识别中的有效性和多功能性
结构体。由于HMAX的计算S2阶段最繁重,因此在嵌入式系统中实现HMAX模型仍然是一个挑战。诸如CoRe16之类的先前实现已使用可重新配置的二维处理元素(PE)阵列来加快HMAX的S2层。但是,CoRe16中的加法器树机制通过在不同的PE中累积部分和来产生输出像素,从而增加了HMAX的运行时间。为了加快HMAX中S2层的执行过程,在本文中,我们提出了SAFA(用于HMAX的收缩加速器),这是一种基于收缩阵列的体系结构,用于计算和加速HMAX的S2阶段。使用输出固定(OS)数据流,SAFA中的每个PE不仅可以独立地计算输出像素,而且无需在多个PE中额外积累部分和,
可重新配置的加速器。此外,在OS中针对相同输入或权重数据的数据转发减少了内存带宽需求。仿真结果表明,
与CoRe16相比,在不同的内核大小(内核= 12除外)下,HMAX模型中最重的计算S2阶段的运行时间减少了5.7%,内存所需的带宽平均减少了3.53倍。通过在ASIC上进行综合,SAFA还比其他可重新配置的加速器获得了更低的功耗和面积成本。

更新日期:2020-10-15
down
wechat
bug