Power- and Cache-Aware Task Mapping with Dynamic Power Budgeting for Many-Cores,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Power- and Cache-Aware Task Mapping with Dynamic Power Budgeting for Many-Cores
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/tc.2019.2935446
Martin Rapp , Mark Sagi , Anuj Pathania , Andreas Herkersdorf , Jorg Henkel

Two factors primarily affect the performance of multi-threaded tasks on many-core processors with logically-shared and physically-distributed Last-Level Cache (LLC): the LLC latencies of threads running on different cores and the per-core power budgets that aim to guarantee thermally safe operation. Two knobs affect these factors: First, the mapping of threads to cores affects both the LLC latencies and the power budgets. Second, dynamic power budgeting refines the power budgets during task execution. A mapping that spatially distributes threads across the many-core increases the power budgets, but unfortunately also increases the LLC latencies. Contrarily, mapping all threads near the center of the many-core minimizes the LLC latencies, but unfortunately also decreases the power budgets. Consequently, both metrics cannot be simultaneously optimal, which leads to a Pareto-optimization for task mapping that has formerly not been exploited. Dynamic power budgeting reallocates the power budgets according to the tasks’ execution phases. This results in a dynamically changing non-uniform power budget, which further increases the performance. We are the first to present a run-time algorithm PCGov combining task-agnostic task mapping and task-aware dynamic power budgeting for many-cores with shared distributed LLC. PCGov yields up to 21 percent lower response time and 13 percent lower energy consumption compared to the state-of-the-art, with a low overhead of less than 0.5 percent.

中文翻译：

具有多核动态功耗预算的功耗和缓存感知任务映射

有两个因素主要影响具有逻辑共享和物理分布的末级缓存 (LLC) 的众核处理器上的多线程任务的性能：在不同内核上运行的线程的 LLC 延迟和针对目标的每核功率预算以保证热安全操作。有两个旋钮会影响这些因素：首先，线程到内核的映射会影响 LLC 延迟和功率预算。其次，动态功率预算在任务执行期间细化功率预算。跨众核空间分布线程的映射增加了功率预算，但不幸的是，也增加了 LLC 延迟。相反，将所有线程映射到众核中心附近可以最大限度地减少 LLC 延迟，但不幸的是，也会降低功率预算。因此，这两个指标不能同时优化，这导致了以前未被利用的任务映射的帕累托优化。动态功率预算根据任务的执行阶段重新分配功率预算。这会导致动态变化的非均匀功率预算，从而进一步提高性能。我们是第一个提出运行时算法 PCGov 的，该算法结合了与任务无关的任务映射和具有共享分布式 LLC 的众核的任务感知动态功率预算。与最先进的技术相比，PCGov 的响应时间缩短了 21%，能耗降低了 13%，并且开销低于 0.5%。这会导致动态变化的非均匀功率预算，从而进一步提高性能。我们是第一个提出运行时算法 PCGov 的，该算法结合了与任务无关的任务映射和具有共享分布式 LLC 的众核的任务感知动态功率预算。与最先进的技术相比，PCGov 的响应时间缩短了 21%，能耗降低了 13%，并且开销低于 0.5%。这会导致动态变化的非均匀功率预算，从而进一步提高性能。我们是第一个提出运行时算法 PCGov 的，该算法结合了与任务无关的任务映射和具有共享分布式 LLC 的众核的任务感知动态功率预算。与最先进的技术相比，PCGov 的响应时间缩短了 21%，能耗降低了 13%，并且开销低于 0.5%。

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11