当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Power- and Cache-Aware Task Mapping with Dynamic Power Budgeting for Many-Cores
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/tc.2019.2935446
Martin Rapp , Mark Sagi , Anuj Pathania , Andreas Herkersdorf , Jorg Henkel

Two factors primarily affect the performance of multi-threaded tasks on many-core processors with logically-shared and physically-distributed Last-Level Cache (LLC): the LLC latencies of threads running on different cores and the per-core power budgets that aim to guarantee thermally safe operation. Two knobs affect these factors: First, the mapping of threads to cores affects both the LLC latencies and the power budgets. Second, dynamic power budgeting refines the power budgets during task execution. A mapping that spatially distributes threads across the many-core increases the power budgets, but unfortunately also increases the LLC latencies. Contrarily, mapping all threads near the center of the many-core minimizes the LLC latencies, but unfortunately also decreases the power budgets. Consequently, both metrics cannot be simultaneously optimal, which leads to a Pareto-optimization for task mapping that has formerly not been exploited. Dynamic power budgeting reallocates the power budgets according to the tasks’ execution phases. This results in a dynamically changing non-uniform power budget, which further increases the performance. We are the first to present a run-time algorithm PCGov combining task-agnostic task mapping and task-aware dynamic power budgeting for many-cores with shared distributed LLC. PCGov yields up to 21 percent lower response time and 13 percent lower energy consumption compared to the state-of-the-art, with a low overhead of less than 0.5 percent.

中文翻译:

具有多核动态功耗预算的功耗和缓存感知任务映射

有两个因素主要影响具有逻辑共享和物理分布的末级缓存 (LLC) 的众核处理器上的多线程任务的性能:在不同内核上运行的线程的 LLC 延迟和针对目标的每核功率预算以保证热安全操作。有两个旋钮会影响这些因素:首先,线程到内核的映射会影响 LLC 延迟和功率预算。其次,动态功率预算在任务执行期间细化功率预算。跨众核空间分布线程的映射增加了功率预算,但不幸的是,也增加了 LLC 延迟。相反,将所有线程映射到众核中心附近可以最大限度地减少 LLC 延迟,但不幸的是,也会降低功率预算。因此,这两个指标不能同时优化,这导致了以前未被利用的任务映射的帕累托优化。动态功率预算根据任务的执行阶段重新分配功率预算。这会导致动态变化的非均匀功率预算,从而进一步提高性能。我们是第一个提出运行时算法 PCGov 的,该算法结合了与任务无关的任务映射和具有共享分布式 LLC 的众核的任务感知动态功率预算。与最先进的技术相比,PCGov 的响应时间缩短了 21%,能耗降低了 13%,并且开销低于 0.5%。这会导致动态变化的非均匀功率预算,从而进一步提高性能。我们是第一个提出运行时算法 PCGov 的,该算法结合了与任务无关的任务映射和具有共享分布式 LLC 的众核的任务感知动态功率预算。与最先进的技术相比,PCGov 的响应时间缩短了 21%,能耗降低了 13%,并且开销低于 0.5%。这会导致动态变化的非均匀功率预算,从而进一步提高性能。我们是第一个提出运行时算法 PCGov 的,该算法结合了与任务无关的任务映射和具有共享分布式 LLC 的众核的任务感知动态功率预算。与最先进的技术相比,PCGov 的响应时间缩短了 21%,能耗降低了 13%,并且开销低于 0.5%。
更新日期:2020-01-01
down
wechat
bug