当前位置: X-MOL 学术IEEE Micro › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic Systolic Array Generation using Reusable Blocks
IEEE Micro ( IF 2.8 ) Pub Date : 2020-07-01 , DOI: 10.1109/mm.2020.2997611
Liancheng Jia 1 , Liqiang Lu 1 , Xuechao Wei 2 , Yun Liang 1
Affiliation  

Systolic array architecture is widely used in spatial hardware and well-suited for many tensor processing algorithms. Many systolic array architectures are implemented with high-level synthesis (HLS) design flow. However, existing HLS tools do not favor of modular and reusable design, which brings inefficiency for design iteration. In this article, we analyze the systolic array design space, and identify the common structures of different systolic dataflows. We build hardware module templates using Chisel infrastructure, which can be reused for different dataflows and computation algorithms. This remarkably improves the productivity for the development and optimization of systolic architecture. We further build a systolic array generator that transforms the tensor algorithm definition to a complete systolic hardware architecture. Experiments show that we can implement systolic array designs for different applications and dataflows with little engineering effort, and the performance throughput outperforms HLS designs.

中文翻译:

使用可重用块自动生成收缩阵列

脉动阵列架构广泛用于空间硬件,非常适合许多张量处理算法。许多脉动阵列架构是通过高级综合 (HLS) 设计流程实现的。然而,现有的 HLS 工具不利于模块化和可重复使用的设计,这导致设计迭代效率低下。在本文中,我们分析了脉动阵列设计空间,并确定了不同脉动数据流的共同结构。我们使用 Chisel 基础设施构建硬件模块模板,可重复用于不同的数据流和计算算法。这显着提高了收缩架构开发和优化的生产力。我们进一步构建了一个脉动阵列生成器,将张量算法定义转换为完整的脉动硬件架构。
更新日期:2020-07-01
down
wechat
bug