Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud
arXiv - CS - Hardware Architecture Pub Date : 2020-03-26 , DOI: arxiv-2003.12101
Shulin Zeng, Guohao Dai, Hanbo Sun, Kai Zhong, Guangjun Ge, Kaiyuan Guo, Yu Wang, Huazhong Yang

FPGAs have shown great potential in providing low-latency and energy-efficient solutions for deep neural network (DNN) inference applications. Currently, the majority of FPGA-based DNN accelerators in the cloud run in a time-division multiplexing way for multiple users sharing a single FPGA, and require re-compilation with $\sim$100 s overhead. Such designs lead to poor isolation and heavy performance loss for multiple users, which are far away from providing efficient and flexible FPGA virtualization for neither public nor private cloud scenarios. To solve these problems, we introduce a novel virtualization framework for instruction architecture set (ISA) based on DNN accelerators by sharing a single FPGA. We enable the isolation by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, further leading to performance isolation for multiple users. On the other hand, to overcome the heavy re-compilation overheads, we propose a tiling-based instruction frame package design and two-stage static-dynamic compilation. Only the light-weight runtime information is re-compiled with $\sim$1 ms overhead, thus the performance is guaranteed for the private cloud. Our extensive experimental results show that the proposed virtualization design achieves 1.07-1.69x and 1.88-3.12x throughput improvement over previous static designs using the single-core and the multi-core architectures, respectively.

中文翻译：

为云中的深度学习实现高效灵活的 FPGA 虚拟化

FPGA 在为深度神经网络 (DNN) 推理应用提供低延迟和节能解决方案方面显示出巨大潜力。目前，云中大多数基于 FPGA 的 DNN 加速器以时分复用方式运行，供共享单个 FPGA 的多个用户使用，并且需要以 $\sim$100 秒的开销重新编译。这样的设计导致多用户隔离性差，性能损失大，离为公有云和私有云场景提供高效灵活的FPGA虚拟化相去甚远。为了解决这些问题，我们通过共享单个 FPGA，为基于 DNN 加速器的指令架构集 (ISA) 引入了一种新颖的虚拟化框架。我们通过引入两级指令调度模块和基于多核的硬件资源池来实现隔离。这样的设计提供了隔离的和运行时可编程的硬件资源，进一步导致多个用户的性能隔离。另一方面，为了克服繁重的重新编译开销，我们提出了基于平铺的指令框架包设计和两阶段静态-动态编译。仅重新编译轻量级运行时信息，开销为 $\sim$1 ms，从而保证私有云的性能。我们广泛的实验结果表明，与以前使用单核和多核架构的静态设计相比，所提出的虚拟化设计分别实现了 1.07-1.69 倍和 1.88-3.12 倍的吞吐量改进。为了克服繁重的重新编译开销，我们提出了基于平铺的指令框架包设计和两阶段静态-动态编译。仅重新编译轻量级运行时信息，开销为 $\sim$1 ms，从而保证私有云的性能。我们广泛的实验结果表明，与以前使用单核和多核架构的静态设计相比，所提出的虚拟化设计分别实现了 1.07-1.69 倍和 1.88-3.12 倍的吞吐量改进。为了克服繁重的重新编译开销，我们提出了基于平铺的指令框架包设计和两阶段静态-动态编译。仅重新编译轻量级运行时信息，开销为 $\sim$1 ms，从而保证私有云的性能。我们广泛的实验结果表明，与以前使用单核和多核架构的静态设计相比，所提出的虚拟化设计分别实现了 1.07-1.69 倍和 1.88-3.12 倍的吞吐量改进。

更新日期：2020-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文