Adaptive and transparent task scheduling of GPU-powered clusters,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adaptive and transparent task scheduling of GPU-powered clusters
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2020-04-24 , DOI: 10.1002/cpe.5793
Qingyu Ci ₁ , Hourong Li ₁ , Shuwei Yang ₁ , Jin Gao ₁

Affiliation

GPGPU-powered supercomputers are vital for various science and engineering applications. On each cluster node, the GPU works as a coprocessor of the CPU, and the computing task runs alternatively on CPU and GPU. Due to this characteristic, traditional task scheduling strategy tends to result in significant workload imbalance and underutilization of GPUs. We design an adaptive scheduling strategy to alleviate such imbalance and underutilization. Our strategy proposes to logically treats all GPUs in the cluster as a whole. Every cluster node maintains a local information table of all GPUs. Once a GPU call request is received, a node selects a GPU to run the task in an adaptive manner based on this table. In addition, our strategy does not rely on a global queue, and thus avoids excessive internode communication overhead. Moreover, we encapsulate our strategy into an intermedia module between the cluster and users. Consequently, underlying details of task scheduling is transparent to users, which enhances usability. We validate our strategy through experiments.

中文翻译：

GPU 驱动的集群的自适应和透明任务调度

GPGPU 驱动的超级计算机对于各种科学和工程应用至关重要。在每个集群节点上，GPU作为CPU的协处理器，计算任务在CPU和GPU上交替运行。由于这一特性，传统的任务调度策略往往会导致工作负载严重失衡和 GPU 利用率不足。我们设计了一种自适应调度策略来缓解这种不平衡和未充分利用。我们的策略建议在逻辑上将集群中的所有 GPU 视为一个整体。每个集群节点维护一个所有 GPU 的本地信息表。一旦收到GPU调用请求，节点就会根据这张表自适应地选择GPU来运行任务。此外，我们的策略不依赖于全局队列，因此避免了过多的节点间通信开销。而且，我们将我们的策略封装到集群和用户之间的一个中间模块中。因此，任务调度的底层细节对用户是透明的，从而增强了可用性。我们通过实验验证我们的策略。

更新日期：2020-04-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>