当前位置: X-MOL 学术Sustain. Comput. Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Utilizing cloud FPGAs towards the open neural network standard
Sustainable Computing: Informatics and Systems ( IF 3.8 ) Pub Date : 2021-02-02 , DOI: 10.1016/j.suscom.2021.100520
Dimitrios Danopoulos , Christoforos Kachris , Dimitrios Soudris

Accurate and efficient Machine Learning algorithms are of vital importance to many problems, especially on classification or clustering tasks but need a universal AI model standard. Unifying machine learning models into a common ecosystem can lead to less development time and better framework interoperability. ONNX (Open Neural Network Exchange Format) is a popular open format to represent deep learning models so that AI developers can more easily move models between state-of-the-art tools. On top of that, hardware companies such as Nvidia or Intel try to keep up with this trend and produce hardware-optimized runtimes (i.e. for CPUs, GPUs, FPGAs) that can handle these open format AI models like ONNX. That enables developers to leverage an heterogeneous mix of hardware and use whichever AI framework they prefer. However, FPGAs have a more challenging solution strategy which as a platform it is also proven to address these kind of problems very efficiently in terms of performance and power. This work is based on an early development stage project which is called HLS4ML originally created for particle physics applications via the automatic generation of neural networks (NNs) for embedded Xilinx FPGAs. Our work involves a hardware-aware NN training and a generalized optimization scheme on top of HLS4ML that boosts the performance and power efficiency of this package and adds functionality for cloud FPGA firmware from any NN model. We start from the FPGA-oriented training of a model in Keras for image recognition, converting into the ONNX open format then porting and optimizing it for cloud FPGAs using a novel scheme with optimizations in host, memory and kernels while using multiple levels of network precision. To the best of our knowledge this is a novel approach that also achieves a speed-up of up to 102× over single CPU in performance and up to 5.5× over GPU in performance/watt.



中文翻译:

将云FPGA应用于开放式神经网络标准

准确高效的机器学习算法对于许多问题至关重要,尤其是在分类或聚类任务上,但需要通用的AI模型标准。将机器学习模型统一到一个通用的生态系统中可以减少开发时间,并提高框架的互操作性。ONNX(开放神经网络交换格式)是一种流行的开放格式,用于表示深度学习模型,因此AI开发人员可以更轻松地在最先进的工具之间移动模型。最重要的是,诸如Nvidia或Intel之类的硬件公司试图跟上这一趋势,并产生可以处理这些开放格式AI模型(如ONNX)的硬件优化的运行时(即针对CPU,GPU,FPGA)。这使开发人员可以利用异构的硬件组合并使用他们喜欢的任何AI框架。然而,FPGA具有更具挑战性的解决方案策略,该解决方案作为平台已被证明可以在性能和功耗方面非常有效地解决此类问题。这项工作基于一个称为HLS4ML的早期开发阶段项目,该项目最初是通过为嵌入式Xilinx FPGA自动生成神经网络(NN)来为粒子物理应用创建的。我们的工作包括在HLS4ML之上进行硬件感知的NN培训和通用优化方案,以提高该程序包的性能和能效,并为任何NN模型的云FPGA固件添加功能。我们从在Keras中进行图像识别的模型的面向FPGA的训练开始,转换为ONNX开放格式,然后使用具有主机优化功能的新颖方案为云FPGA移植和优化该模型,内存和内核,同时使用多个级别的网络精度。据我们所知,这是一种新颖的方法,也可以将速度提高到102× 在单个CPU上的性能高达5.5× 在GPU上的性能/瓦数。

更新日期:2021-02-09
down
wechat
bug