当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs
arXiv - CS - Mathematical Software Pub Date : 2021-07-28 , DOI: arxiv-2107.13500
Nick Brown

Reconfigurable architectures, such as FPGAs, enable the execution of code at the electronics level, avoiding the assumptions imposed by the general purpose black-box micro-architectures of CPUs and GPUs. Such tailored execution can result in increased performance and power efficiency, and as the HPC community moves towards exascale an important question is the role such hardware technologies can play in future supercomputers. In this paper we explore the porting of the PW advection kernel, an important code component used in a variety of atmospheric simulations and accounting for around 40\% of the runtime of the popular Met Office NERC Cloud model (MONC). Building upon previous work which ported this kernel to an older generation of Xilinx FPGA, we target latest generation Xilinx Alveo U280 and Intel Stratix 10 FPGAs. Exploring the development of a dataflow design which is performance portable between vendors, we then describe implementation differences between the tool chains and compare kernel performance between FPGA hardware. This is followed by a more general performance comparison, scaling up the number of kernels on the Xilinx Alveo and Intel Stratix 10, against a 24 core Xeon Platinum Cascade Lake CPU and NVIDIA Tesla V100 GPU. When overlapping the transfer of data to and from the boards with compute, the FPGA solutions considerably outperform the CPU and, whilst falling short of the GPU in terms of performance, demonstrate power usage benefits, with the Alveo being especially power efficient. The result of this work is a comparison and set of design techniques that apply both to this specific atmospheric advection kernel on Xilinx and Intel FPGAs, and that are also of interest more widely when looking to accelerate HPC codes on a variety of reconfigurable architectures.

中文翻译:

在 Xilinx 和 Intel FPGA 上加速对流进行大气建模

诸如 FPGA 之类的可重构架构支持在电子级别执行代码,避免了 CPU 和 GPU 的通用黑盒微架构强加的假设。这种定制的执行可以提高性能和能效,随着 HPC 社区向百亿亿级发展,一个重要的问题是这种硬件技术在未来的超级计算机中可以发挥的作用。在本文中,我们探讨了 PW 对流内核的移植,这是一种用于各种大气模拟的重要代码组件,约占流行的气象局 NERC 云模型 (MONC) 运行时间的 40%。基于之前将该内核移植到旧版 Xilinx FPGA 的工作,我们的目标是最新一代 Xilinx Alveo U280 和 Intel Stratix 10 FPGA。探索在供应商之间性能可移植的数据流设计的开发,然后我们描述工具链之间的实现差异并比较 FPGA 硬件之间的内核性能。接下来是更一般的性能比较,将 Xilinx Alveo 和 Intel Stratix 10 上的内核数量与 24 核 Xeon Platinum Cascade Lake CPU 和 NVIDIA Tesla V100 GPU 进行比较。当将数据传输到电路板和从电路板与计算重叠时,FPGA 解决方案的性能明显优于 CPU,虽然在性能方面不及 GPU,但表现出功耗优势,其中 Alveo 尤其节能。这项工作的结果是对 Xilinx 和 Intel FPGA 上的这个特定大气对流内核进行了比较和设计技术集,
更新日期:2021-07-29
down
wechat
bug