当前位置: X-MOL 学术J. Grid Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BAN-Storm: a Bandwidth-Aware Scheduling Mechanism for Stream Jobs
Journal of Grid Computing ( IF 5.5 ) Pub Date : 2021-06-20 , DOI: 10.1007/s10723-021-09567-x
Asif Muhammad , Muhammad Aleem

The essential component of the Big Data system is the processing frameworks and engines responsible for crunching the data. To cope with the growing computing demands of real-time Big Data applications, researchers have proposed several computing frameworks. The core of the computing frameworks i.e., the scheduling mechanisms for real-time stream processing need to accommodate several important aspects such as incorporating resource awareness, heterogeneity of the computing resources, load balancing, etc. These aspects contribute significantly to the attained performance of the computing frameworks. Therefore, ignoring one of these aspects may lead to degraded performance. Most of the present stream processing frameworks do not consider the communication patterns and heterogeneity of the computing resources. This causes the highly communicating tasks mapped on different and costly remote nodes resulting in the increased communication overheads and latencies. In this work, we propose BAN-Storm, a stream scheduler that considers inter-task communication along the other important scheduling aspects such as heterogeneity, etc. to schedule stream jobs. The core objective of the proposed scheduler is to gain performance (i.e., higher throughput and reduced latency) using a resource-aware mapping mechanism. The proposed BAN-Storm schedules stream jobs considering Inter-task communication and machine’s computing power. The BAN-Storm employs a two-phase mapping mechanism i.e., in the first phase, the tasks are grouped so that the inter-group communication becomes low. In the second phase, for the resource-aware mapping, the computing power of each node is calculated using FLOPS, Memory (i.e., RAM), and Bandwidth followed by the task-group assignment to nodes (mapping on more capable nodes first). Apache Storm is used for the implementation of the proposed BAN-Storm scheduling mechanism. Experimental evaluation is done using the two real application topologies. The attained results are benchmarked using the three state-of-the-art stream schedulers. The thorough experimental results show up to 30% higher attained throughput as compared to the Apache Storm scheduler. Moreover, the attained results show that the proposed BAN-Storm provisions up to 33–66% fewer resources as compared to the default Storm.



中文翻译:

BAN-Storm:流作业的带宽感知调度机制

大数据系统的基本组成部分是负责处理数据的处理框架和引擎。为了应对实时大数据应用不断增长的计算需求,研究人员提出了几种计算框架。计算框架的核心,即实时流处理的调度机制需要适应几个重要的方面,例如合并资源意识、计算资源的异构性、负载平衡等。这些方面对获得的性能有显着贡献。计算框架。因此,忽略这些方面之一可能会导致性能下降。目前的大多数流处理框架都没有考虑计算资源的通信模式和异构性。这导致高度通信的任务映射到不同且昂贵的远程节点,从而导致通信开销和延迟增加。在这项工作中,我们提出BAN-风暴,一个流调度器,它考虑任务间通信以及其他重要的调度方面(如异构性等)来调度流作业。提议的调度器的核心目标是使用资源感知映射机制获得性能(即更高的吞吐量和减少的延迟)。考虑到任务间通信和机器的计算能力,提议的 BAN-Storm 调度流作业。BAN-Storm 采用两阶段映射机制,即在第一阶段,任务被分组,使得组间通信变低。在第二阶段,对于资源感知映射,使用 FLOPS、内存(即 RAM)和带宽计算每个节点的计算能力,然后将任务组分配给节点(首先映射到更有能力的节点上)。Apache Storm 用于实现建议的 BAN-Storm 调度机制。实验评估是使用两种实际应用拓扑完成的。使用三个最先进的流调度器对获得的结果进行基准测试。彻底的实验结果表明,与 Apache Storm 调度程序相比,达到的吞吐量提高了 30%。此外,获得的结果表明,与默认 Storm 相比,提议的 BAN-Storm 提供的资源最多减少 33-66%。

更新日期:2021-06-20
down
wechat
bug