当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference
arXiv - CS - Programming Languages Pub Date : 2020-06-04 , DOI: arxiv-2006.03031
Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang

Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations, and a light-weight virtual machine runtime. Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms including Intel CPUs, ARM CPUs, and Nvidia GPUs.

中文翻译:

Nimble:高效编译动态神经网络以进行模型推理

现代深度神经网络越来越多地利用动态控制流、数据结构和动态张量形状等特征。现有的深度学习系统专注于优化和执行静态神经网络,这些网络假设预先确定的模型架构和输入数据形状——动态神经网络违反了这些假设。因此,如果不是不可能,使用深度学习系统执行动态模型目前既不灵活,也不理想。优化动态神经网络比静态神经网络更具挑战性;优化必须考虑所有可能的执行路径和张量形状。本文提出了 Nimble,这是一个高性能且灵活的系统,用于在多个平台上优化、编译和执行动态神经网络。Nimble 通过引入动态类型系统来处理模型动态性,一组面向动态的优化,以及一个轻量级的虚拟机运行时。我们的评估表明,在包括 Intel CPU、ARM CPU 和 Nvidia GPU 在内的硬件平台上,Nimble 的性能优于最先进的动态神经网络深度学习框架和运行时系统高达 20 倍。
更新日期:2020-06-05
down
wechat
bug