当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
arXiv - CS - Hardware Architecture Pub Date : 2021-09-15 , DOI: arxiv-2109.07419
Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna

To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these "domain-specific" accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are several challenges when designing new algorithms and mapping approaches to execute the algorithms for a target problem on new hardware. Previous works have addressed these challenges individually. To address this challenge as a whole, in this work, we present a HW-SW co-design ecosystem for spatial accelerators called Union within the popular MLIR compiler infrastructure. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. Union also includes a plug-and-play library of accelerator cost models and mappers which can easily be extended. The algorithms and accelerator cost models are connected via a novel mapping abstraction that captures the map space of spatial accelerators which can be systematically pruned based on constraints from the hardware, workload, and mapper. We demonstrate the value of Union for the community with several case studies which examine offloading different tensor operations(CONV/GEMM/Tensor Contraction) on diverse accelerator architectures using different mapping schemes.

中文翻译:

Union:MLIR 中统一的 HW-SW 协同设计生态系统,用于评估空间加速器上的张量运算

为了满足跨商业和科学应用的深度学习的极端计算需求,数据流加速器正变得越来越流行。虽然这些“特定领域”加速器不像 CPU 和 GPU 那样完全可编程,但它们在数据编排方面保留了不同级别的灵活性,即数据流和平铺优化以提高效率。在设计新算法和映射方法以在新硬件上针对目标问题执行算法时,存在一些挑战。以前的工作分别解决了这些挑战。为了从整体上解决这一挑战,在这项工作中,我们在流行的 MLIR 编译器基础设施中为空间加速器提供了一个 HW-SW 协同设计生态系统,称为 Union。我们的框架允许在多个加速器成本模型上探索不同的算法及其映射。Union 还包括一个即插即用的加速器成本模型和映射器库,可以轻松扩展。算法和加速器成本模型通过一种新颖的映射抽象连接起来,该抽象捕获空间加速器的地图空间,可以根据硬件、工作负载和映射器的约束系统地修剪这些空间。我们通过几个案例研究展示了 Union 对社区的价值,这些案例研究检查了使用不同映射方案在不同加速器架构上卸载不同张量操作(CONV/GEMM/Tensor Contraction)。算法和加速器成本模型通过一种新颖的映射抽象连接起来,该抽象捕获空间加速器的地图空间,可以根据硬件、工作负载和映射器的约束系统地修剪这些空间。我们通过几个案例研究展示了 Union 对社区的价值,这些案例研究检查了使用不同映射方案在不同加速器架构上卸载不同张量操作(CONV/GEMM/Tensor Contraction)。算法和加速器成本模型通过一种新颖的映射抽象连接起来,该抽象捕获空间加速器的地图空间,可以根据硬件、工作负载和映射器的约束系统地修剪这些空间。我们通过几个案例研究展示了 Union 对社区的价值,这些案例研究检查了使用不同映射方案在不同加速器架构上卸载不同张量操作(CONV/GEMM/Tensor Contraction)。
更新日期:2021-09-16
down
wechat
bug