当前位置: X-MOL 学术Int. J. Parallel. Program › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures
International Journal of Parallel Programming ( IF 1.5 ) Pub Date : 2020-11-20 , DOI: 10.1007/s10766-020-00687-7
Sven Rheindt , Sebastian Maier , Nora Pohle , Lars Nolte , Oliver Lenke , Florian Schmaus , Thomas Wild , Wolfgang Schröder-Preikschat , Andreas Herkersdorf

The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. However, this introduced a data-to-task locality challenge and inter-tile communication thus often imposes significant software overhead. Thus, we proposed software-defined hardware-managed SHARQ queues that enable efficient inter-tile communication by leveraging user-defined queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces an optional handler task, which is scheduled by hardware on demand. Queue management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only rare tasks, like the dynamic queue creation at run-time, are performed in software. DySHARQ, an extension of SHARQ, enables dynamic and concurrent queue memory management and queue length adjustments to be able to adapt to application and resource requirement changes. The DySHARQ hardware is able to monitor the queue memory requirements at run-time and conditionally schedules a software-defined memory management task. It further optimizes the hardware-software interaction for local queue operations. We integrated DySHARQ into the MPI library used by the NAS benchmarks. The evaluation shows a reduction in execution time by up to 43% (compared to software) for the communication intense IS kernel in a 4 $$\times$$ 4 tile design on an FPGA platform with a total of 80 LEON3 cores. The dynamic memory management reduces the memory footprint by 3.75 $$\times$$ in a 2 $$\times$$ 2 design.

中文翻译:

DySHARQ:用于基于 Tile 的架构的动态软件定义硬件管理队列

最近基于区块的多核架构的趋势通过物理分布内存和处理节点来帮助解决内存墙。然而,这引入了数据到任务的局部性挑战,因此瓦片间通信通常会带来大量的软件开销。因此,我们提出了软件定义的硬件管理的 SHARQ 队列,它通过利用具有任意大小元素的用户定义队列来实现高效的块间通信。为了确保(远程)处理排队元素,SHARQ 引入了一个可选的处理程序任务,该任务由硬件按需调度。队列管理、分片内和分片间数据传输以及处理程序任务调用完全由硬件处理。只有罕见的任务,如运行时的动态队列创建,是在软件中执行的。DySHARQ,SHARQ 的扩展,启用动态和并发队列内存管理和队列长度调整,以适应应用程序和资源需求的变化。DySHARQ 硬件能够在运行时监控队列内存需求,并有条件地调度软件定义的内存管理任务。它进一步优化了本地队列操作的软硬件交互。我们将 DySHARQ 集成到 NAS 基准测试使用的 MPI 库中。评估显示,在 FPGA 平台上的 4 $$\times$$ 4 tile 设计中,通信密集型 IS 内核的执行时间减少了多达 43%(与软件相比),总共有 80 个 LEON3 内核。动态内存管理在 2 $$\times$$ 2 设计中减少了 3.75 $$\times$$ 的内存占用。
更新日期:2020-11-20
down
wechat
bug