Bring Your Own Codegen to Deep Learning Compiler,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bring Your Own Codegen to Deep Learning Compiler
arXiv - CS - Performance Pub Date : 2021-05-03 , DOI: arxiv-2105.03215
Zhi Chen, Cody Hao Yu, Trevor Morris, Jorn Tuyls, Yi-Hsiang Lai, Jared Roesch, Elliott Delaye, Vin Sharma, Yida Wang

Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage with high performance, each accelerator vendor has to develop a full compiler stack to ingest, optimize, and execute the DNNs. This poses significant challenges in the development and maintenance of the software stack. In addition, the vendors have to contiguously update their hardware and/or software to cope with the rapid evolution of the DNN model architectures and operators. To address these issues, this paper proposes an open source framework that enables users to only concentrate on the development of their proprietary code generation tools by reusing as many as possible components in the existing deep learning compilers. Our framework provides users flexible and easy-to-use interfaces to partition their models into segments that can be executed on "the best" processors to take advantage of the powerful computation capability of accelerators. Our case study shows that our framework has been deployed in multiple commercial vendors' compiler stacks with only a few thousand lines of code.

中文翻译：

将您自己的代码生成到深度学习编译器

深度神经网络（DNN）已广泛应用于许多应用程序中，并且出现了加速器来支持这些应用程序的快速和有效推理任务。但是，为了以高性能实现高模型覆盖率，每个加速器供应商都必须开发完整的编译器堆栈以摄取，优化和执行DNN。这对软件堆栈的开发和维护提出了严峻的挑战。此外，供应商必须连续更新其硬件和/或软件，以应对DNN模型架构和运营商的快速发展。为了解决这些问题，本文提出了一个开源框架，该框架使用户能够通过重用现有深度学习编译器中尽可能多的组件来专注于其专有代码生成工具的开发。我们的框架为用户提供了灵活且易于使用的界面，可将其模型划分为多个段，这些段可在“最佳”处理器上执行，以利用加速器强大的计算能力。我们的案例研究表明，我们的框架仅用几千行代码就已部署在多个商业供应商的编译器堆栈中。处理器可以利用加速器强大的计算能力。我们的案例研究表明，我们的框架仅用几千行代码就已部署在多个商业供应商的编译器堆栈中。处理器来利用加速器强大的计算能力。我们的案例研究表明，我们的框架仅用几千行代码就已部署在多个商业供应商的编译器堆栈中。

更新日期：2021-05-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>