DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2021-01-03 , DOI: 10.1007/s10766-020-00688-6
Akshay Srivatsa , Mostafa Mansour , Sven Rheindt , Dirk Gabriel , Thomas Wild , Andreas Herkersdorf

Embedded system applications, with their inherently limited parallelism, rarely exploit all available processing resources in large DSM-based manycore architectures. From a cache coherence perspective, this provides an opportunity to move away from global coherence spanning across all tiles, which does not scale well. Therefore, we favor a region-based cache coherence (RBCC) approach that enables coherence among a selectable cluster of tiles in accordance with application requirements. We present the design and hardware implementation of a flexibly configurable coherency region manager (CRM) that enables RBCC. We introduce two novel features that enhance RBCC, namely, runtime coherency region re-configuration and RBCC-malloc(), that dynamically tailor coherence to actually shared application working sets. Further, we propose, implement and evaluate additional CRM functions such as a non-intrusive barrier synchronization mechanism and a false sharing resolution strategy for our DSM-based manycore architecture. We have synthesized the CRM on an FPGA prototype for a 64-core system and observe a 38% reduction in BRAM-utilization compared to a global coherence directory for regions with up to 32 cores. Experiments using a video streaming application reveal a speed-up of up to 42% compared to an alternative message passing based implementation. We also evaluate the benefits of runtime coherency region re-configuration using two scenarios and present a formal analysis on when a re-configuration is beneficial.

中文翻译：

DynaCo：平铺多核架构的动态一致性管理

嵌入式系统应用程序具有固有的有限并行性，很少利用基于 DSM 的大型多核架构中的所有可用处理资源。从缓存一致性的角度来看，这提供了一个机会来摆脱跨所有瓦片的全局一致性，这不能很好地扩展。因此，我们倾向于使用基于区域的缓存一致性 (RBCC) 方法，该方法可以根据应用程序要求实现可选择的瓦片集群之间的一致性。我们介绍了支持 RBCC 的可灵活配置的一致性区域管理器 (CRM) 的设计和硬件实现。我们介绍了两个增强 RBCC 的新功能，即运行时一致性区域重新配置和 RBCC-malloc()，它们动态地调整实际共享应用程序工作集的一致性。此外，我们建议，为我们基于 DSM 的众核架构实施和评估额外的 CRM 功能，例如非侵入式屏障同步机制和错误共享解决策略。我们在 64 核系统的 FPGA 原型上综合了 CRM，并观察到与具有多达 32 核的区域的全局一致性目录相比，BRAM 使用率降低了 38%。使用视频流应用程序的实验表明，与基于消息传递的替代实现相比，速度提高了 42%。我们还使用两种场景评估了运行时一致性区域重新配置的好处，并就重新配置何时有益进行了正式分析。我们在 64 核系统的 FPGA 原型上综合了 CRM，并观察到与具有多达 32 核的区域的全局一致性目录相比，BRAM 使用率降低了 38%。使用视频流应用程序的实验表明，与基于消息传递的替代实现相比，速度提高了 42%。我们还使用两种场景评估了运行时一致性区域重新配置的好处，并就重新配置何时有益进行了正式分析。我们在 64 核系统的 FPGA 原型上综合了 CRM，并观察到与具有多达 32 核的区域的全局一致性目录相比，BRAM 使用率降低了 38%。使用视频流应用程序的实验表明，与基于消息传递的替代实现相比，速度提高了 42%。我们还使用两种场景评估了运行时一致性区域重新配置的好处，并就重新配置何时有益进行了正式分析。

更新日期：2021-01-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11