当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Collaborative execution of fluid flow simulation using non-uniform decomposition on heterogeneous architectures
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2021-02-15 , DOI: 10.1016/j.jpdc.2021.02.006
Gabriel Freytag , Matheus S. Serpa , João V.F. Lima , Paolo Rech , Philippe O.A. Navaux

The demand for computing power, along with the diversity of computational problems, culminated in a variety of heterogeneous architectures. Among them, hybrid architectures combine different specialized hardware into a single chip, comprising a System-on-Chip (SoC). Since these architectures usually have limited resources, efficiently splitting data and tasks between the different hardware is primal to improve performance. In this context, we explore the non-uniform decomposition of the data domain to improve fluid flow simulation performance on heterogeneous architectures. We evaluate two hybrid architectures: one comprised of a general-purpose x86 CPU and a graphics processing unit (GPU) integrated into a single chip (AMD Kaveri SoC), and another comprised by a general-purpose ARM CPU and a Field Programmable Gate Array (FPGA) integrated into the same chip (Intel Arria 10 SoC). We investigate the effects on performance and energy efficiency of data decomposition on each platform’s devices on a collaborative execution. Our case study is the well-known Lattice Boltzmann Method (LBM), where we apply the technique and analyze the performance and energy efficiency of five kernels on both devices on each platform. Our experimental results show that non-uniform partitioning improves the performance of LBM kernels by up to 11.40% and 15.15% on AMD Kaveri and Intel Arria 10, respectively. While AMD’s Kaveri platform’s performance efficiency is of up to 10.809 MLUPS with an energy efficiency of 142.881 MLUPKJ, Intel’s Arria 10 platform’s is of up to 1.12 MLUPS and 82.272 MLUPKJ.



中文翻译:

在异构架构上使用非均匀分解协同执行流体流动模拟

对计算能力的需求以及计算问题的多样性最终导致了各种异构体系结构。其中,混合架构将不同的专用硬件组合到一个包含片上系统(SoC)的芯片中。由于这些体系结构通常具有有限的资源,因此在不同的硬件之间有效地拆分数据和任务对于提高性能至关重要。在这种情况下,我们探索了数据域的非均匀分解,以提高异构架构上的流体流动仿真性能。我们评估了两种混合架构:一种由通用x86 CPU和集成到单个芯片(AMD Kaveri SoC)中的图形处理单元(GPU)组成,另一个则由通用ARM CPU和集成在同一芯片(英特尔Arria 10 SoC)中的现场可编程门阵列(FPGA)组成。我们研究了在协作执行时每个平台设备上数据分解对性能和能效的影响。我们的案例研究是著名的Lattice Boltzmann方法(LBM),我们在其中应用该技术并分析每个平台上两个设备上五个内核的性能和能效。我们的实验结果表明,在AMD Kaveri和Intel Arria 10上,非均匀分区分别将LBM内核的性能提高了11.40%和15.15%。AMD的Kaveri平台的性能效率高达10.809 MLUPS,能源效率为142.881 MLUPKJ,而英特尔的Arria 10平台的性能效率高达1.12 MLUPS和82.272 MLUPKJ。

更新日期:2021-03-05
down
wechat
bug