当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs
arXiv - CS - Hardware Architecture Pub Date : 2021-09-14 , DOI: arxiv-2109.06382
Joseph Zuckerman, Davide Giri, Jihye Kwon, Paolo Mantovani, Luca P. Carloni

One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from accelerators accessing off-chip memory directly, bypassing the cache hierarchy, to accelerators having their own private cache. By running real-size applications on FPGA-based prototypes of many-accelerator multi-core SoCs, we show that the best cache-coherence mode for a given accelerator varies at runtime, depending on the accelerator's characteristics, the workload size, and the overall SoC status. Cohmeleon applies reinforcement learning to select the best coherence mode for each accelerator dynamically at runtime, as opposed to statically at design time. It makes these selections adaptively, by continuously observing the system and measuring its performance. Cohmeleon is accelerator-agnostic, architecture-independent, and it requires minimal hardware support. Cohmeleon is also transparent to application programmers and has a negligible software overhead. FPGA-based experiments show that our runtime approach offers, on average, a 38% speedup with a 66% reduction of off-chip memory accesses compared to state-of-the-art design-time approaches. Moreover, it can match runtime solutions that are manually tuned for the target architecture.

中文翻译:

Cohmeleon:异构 SoC 中基于学习的加速器一致性编排

在异构 SoC 架构中集成松耦合加速器的最关键方面之一是协调它们与存储器层次结构的交互,特别是在导航各种缓存一致性选项方面:从加速器直接访问片外存储器,绕过缓存层次结构, 具有自己的私有缓存的加速器。通过在多加速器多核 SoC 的基于 FPGA 的原型上运行实际大小的应用程序,我们表明给定加速器的最佳缓存一致性模式在运行时会有所不同,具体取决于加速器的特性、工作负载大小和整体SoC 状态。Cohmeleon 应用强化学习为每个加速器在运行时动态选择最佳一致性模式,而不是在设计时静态选择。它自适应地进行这些选择,通过持续观察系统并测量其性能。Cohmeleon 与加速器无关,与架构无关,并且需要最少的硬件支持。Cohmeleon 对应用程序程序员也是透明的,并且软件开销可以忽略不计。基于 FPGA 的实验表明,与最先进的设计时方法相比,我们的运行时方法平均提供了 38% 的加速,同时片外存储器访问减少了 66%。此外,它可以匹配针对目标架构手动调整的运行时解决方案。基于 FPGA 的实验表明,与最先进的设计时方法相比,我们的运行时方法平均提供了 38% 的加速,同时片外存储器访问减少了 66%。此外,它可以匹配针对目标架构手动调整的运行时解决方案。基于 FPGA 的实验表明,与最先进的设计时方法相比,我们的运行时方法平均提供了 38% 的加速,同时片外存储器访问减少了 66%。此外,它可以匹配针对目标架构手动调整的运行时解决方案。
更新日期:2021-09-15
down
wechat
bug