heSRPT: Parallel Scheduling to Minimize Mean Slowdown,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

heSRPT: Parallel Scheduling to Minimize Mean Slowdown
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-11-18 , DOI: arxiv-2011.09676
Benjamin Berg, Rein Vesilo, Mor Harchol-Balter

Modern data centers serve workloads which are capable of exploiting parallelism. When a job parallelizes across multiple servers it will complete more quickly, but jobs receive diminishing returns from being allocated additional servers. Because allocating multiple servers to a single job is inefficient, it is unclear how best to allocate a fixed number of servers between many parallelizable jobs. This paper provides the first optimal allocation policy for minimizing the mean slowdown of parallelizable jobs of known size when all jobs are present at time 0. Our policy provides a simple closed form formula for the optimal allocations at every moment in time. Minimizing mean slowdown usually requires favoring short jobs over long ones (as in the SRPT policy). However, because parallelizable jobs have sublinear speedup functions, system efficiency is also an issue. System efficiency is maximized by giving equal allocations to all jobs and thus competes with the goal of prioritizing small jobs. Our optimal policy, high-efficiency SRPT (heSRPT), balances these competing goals. heSRPT completes jobs according to their size order, but maintains overall system efficiency by allocating some servers to each job at every moment in time. Our results generalize to also provide the optimal allocation policy with respect to mean flow time. Finally, we consider the online case where jobs arrive to the system over time. While optimizing mean slowdown in the online setting is even more difficult, we find that heSRPT provides an excellent heuristic policy for the online setting. In fact, our simulations show that heSRPT significantly outperforms state-of-the-art allocation policies for parallelizable jobs.

中文翻译：

heSRPT：并行调度以最小化平均减速

现代数据中心为能够利用并行性的工作负载提供服务。当一个作业在多个服务器上并行时，它会更快地完成，但作业会因分配额外的服务器而收到递减的回报。由于将多个服务器分配给单个作业效率低下，因此不清楚如何最好地在多个可并行作业之间分配固定数量的服务器。本文提供了第一个最优分配策略，当所有作业都在时间 0 时，最小化已知大小的可并行化作业的平均减速。我们的策略为每个时刻的最优分配提供了一个简单的封闭式公式。尽量减少平均放缓通常需要偏爱短期工作而不是长期工作（如在 SRPT 政策中）。但是，由于可并行化作业具有次线性加速功能，系统效率也是一个问题。通过对所有工作进行平等分配，系统效率最大化，从而与优先考虑小工作的目标相竞争。我们的最优策略、高效 SRPT (heSRPT) 平衡了这些相互竞争的目标。heSRPT 根据作业的大小顺序完成作业，但通过在每个时刻为每个作业分配一些服务器来保持整体系统效率。我们的结果概括为还提供了关于平均流动时间的最佳分配策略。最后，我们考虑作业随时间到达系统的在线情况。虽然优化在线设置中的平均减速更加困难，但我们发现 heSRPT 为在线设置提供了极好的启发式策略。实际上，

更新日期：2020-11-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文