A Value-Oriented Job Scheduling Approach for Power-Constrained and Oversubscribed HPC Systems,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Value-Oriented Job Scheduling Approach for Power-Constrained and Oversubscribed HPC Systems
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-06-01 , DOI: 10.1109/tpds.2020.2967373
Nirmal Kumbhare , Aniruddha Marathe , Ali Akoglu , Howard Jay Siegel , Ghaleb Abdulla , Salim Hariri

In this article, we investigate limitations in the traditional value-based algorithms for a power-constrained HPC system and evaluate their impact on HPC productivity. We expose the trade-off between allocating system-wide power budget uniformly and greedily under different system-wide power constraints in an oversubscribed system. We experimentally demonstrate that, under the tightest power constraint, the mean productivity of the greedy allocation is 38 percent higher than the uniform allocation whereas, under the intermediate power constraint, the uniform allocation has a mean productivity of 6 percent higher than the greedy allocation. We then propose a new algorithm that adapts its behavior to deliver the combined benefits of the two allocation strategies. We design a methodology with online retraining capability to create application-specific power-execution time models for a class of HPC applications. These models are used in predicting the execution time of an application on the available resources at the time of making scheduling decisions in the power-aware algorithms. We evaluate the proposed algorithm using emulation and simulation environments, and show that our adaptive strategy results in improving HPC resource utilization while delivering a mean productivity that is almost the same as the best performing algorithm across various system-wide power constraints.

中文翻译：

功率受限和超额订阅的 HPC 系统的面向价值的作业调度方法

在本文中，我们研究了传统基于价值的算法在功率受限的 HPC 系统中的局限性，并评估了它们对 HPC 生产力的影响。我们揭示了在超额订阅系统中在不同的系统范围功率限制下均匀和贪婪地分配系统范围功率预算之间的权衡。我们通过实验证明，在最严格的权力约束下，贪婪分配的平均生产率比统一分配高 38%，而在中等权力约束下，统一分配的平均生产率比贪婪分配高 6%。然后，我们提出了一种新算法，可以调整其行为以提供两种分配策略的综合优势。我们设计了一种具有在线再训练功能的方法，可为一类 HPC 应用程序创建特定于应用程序的电源执行时间模型。这些模型用于在功率感知算法中做出调度决策时预测应用程序在可用资源上的执行时间。我们使用仿真和模拟环境评估了所提出的算法，并表明我们的自适应策略可提高 HPC 资源利用率，同时提供与跨各种系统范围功率限制的最佳性能算法几乎相同的平均生产力。这些模型用于在功率感知算法中做出调度决策时预测应用程序在可用资源上的执行时间。我们使用仿真和模拟环境评估了所提出的算法，并表明我们的自适应策略可提高 HPC 资源利用率，同时提供与跨各种系统范围功率限制的最佳性能算法几乎相同的平均生产力。这些模型用于在功率感知算法中做出调度决策时预测应用程序在可用资源上的执行时间。我们使用仿真和模拟环境评估了所提出的算法，并表明我们的自适应策略可提高 HPC 资源利用率，同时提供与跨各种系统范围功率限制的最佳性能算法几乎相同的平均生产力。

更新日期：2020-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11