当前位置: X-MOL 学术Parallel Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A scalable and low latency probe-based scheduler for data analytics frameworks
Parallel Computing ( IF 2.0 ) Pub Date : 2021-02-17 , DOI: 10.1016/j.parco.2021.102752
Mansour Khelghatdoust , Vincent Gramoli

Today’s data analytics frameworks divide jobs into many parallel tasks such that each task operates on a small partition of data in order to execute jobs with low latency. Such frameworks often rely on probe-based distributed schedulers to tackle the challenge of reducing the associated overhead. Unfortunately, the existing solutions do not perform efficiently under workload fluctuations and heterogeneous job durations. This is due to a problem called Head-of-Line blocking, i.e., short tasks are enqueued at workers behind longer tasks. To overcome this problem, we propose Peacock (Khelghatdoust and Gramoli, 0000) [25] a new fully distributed probe-based scheduling method. Unlike the existing methods, Peacock introduces a novel probe rotation technique. Workers form a ring overlay network and rotate probes using elastic queues of workers. It is augmented by a novel starvation-free probe reordering algorithm executed by workers. We evaluate Peacock against two existing state-of-the-art probe based solutions through a trace driven simulation of up to 20,000 workers and a distributed experiment of 100 workers in Apache Spark under Google, Cloudera, and Yahoo! traces. The performance results indicate that Peacock outperforms the state-of-the-art in all cluster sizes and loads. Our distributed experiments confirm our simulation results.



中文翻译:

用于数据分析框架的可扩展,低延迟,基于探针的调度程序

当今的数据分析框架将作业分为许多并行任务,以便每个任务在较小的数据分区上运行,以便以低延迟执行作业。这样的框架通常依赖于基于探针的分布式调度程序来解决减少相关开销的挑战。不幸的是,现有的解决方案在工作负载波动和异构工作持续时间下无法有效执行。这是由于称为行头阻塞的问题引起的,即短任务被排在长任务后面的工人身上。为了克服这个问题,我们提出了孔雀(Khelghatdoust and Gramoli,0000)[25]一种新的完全分布式基于探针的调度方法。与现有方法不同,孔雀引入了一种新颖的探针旋转技术。工作人员形成环形覆盖网络,并使用工作人员的弹性队列旋转探针。它由工人执行的新颖的无饥饿探针重新排序算法来增强。我们通过跟踪驱动的最多20,000个工作人员的仿真以及在Google,Cloudera和Yahoo!的Apache Spark中对100个工作人员进行的分布式实验,针对两个现有的基于探针的现有解决方案对孔雀进行了评估。痕迹。性能结果表明,在所有群集大小和负载方面,孔雀的性能均优于最新技术。

更新日期:2021-02-21
down
wechat
bug