当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling
arXiv - CS - Hardware Architecture Pub Date : 2020-09-17 , DOI: arxiv-2009.08241
Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan Gomez-Luna, Sander Stuijk, Onur Mutlu, Henk Corporaal

Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration. To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth memory (HBM). We focus on compound stencils that are fundamental kernels in weather prediction models. By using high-level synthesis techniques, we develop NERO, an FPGA+HBM-based accelerator connected through IBM CAPI2 (Coherent Accelerator Processor Interface) to an IBM POWER9 host system. Our experimental results show that NERO outperforms a 16-core POWER9 system by 4.2x and 8.3x when running two different compound stencil kernels. NERO reduces the energy consumption by 22x and 29x for the same two kernels over the POWER9 system with an energy efficiency of 1.5 GFLOPS/Watt and 17.3 GFLOPS/Watt. We conclude that employing near-memory acceleration solutions for weather prediction modeling is promising as a means to achieve both high performance and high energy efficiency.

中文翻译:

NERO:用于天气预报建模的近高带宽内存模板加速器

持续的气候变化需要快速准确的天气和气候建模。然而,在解决大规模天气预报模拟时,最先进的 CPU 和 GPU 实现的性能有限且能耗高。这些实现以复杂的不规则内存访问模式和低算术强度为主,这对加速构成了根本性挑战。为了克服这些挑战,我们提出并评估了使用具有高带宽内存 (HBM) 的可重构结构的近内存加速的使用。我们专注于作为天气预报模型中基本内核的复合模板。通过使用高级综合技术,我们开发了 NERO,这是一种基于 FPGA+HBM 的加速器,通过 IBM CAPI2(相干加速器处理器接口)连接到 IBM POWER9 主机系统。我们的实验结果表明,在运行两种不同的复合模板内核时,NERO 的性能比 16 核 POWER9 系统高 4.2 倍和 8.3 倍。与 POWER9 系统相比,NERO 将相同的两个内核的能耗降低了 22 倍和 29 倍,能效分别为 1.5 GFLOPS/Watt 和 17.3 GFLOPS/Watt。我们得出的结论是,采用近内存加速解决方案进行天气预报建模很有希望作为实现高性能和高能效的一种手段。
更新日期:2020-09-18
down
wechat
bug