当前位置: X-MOL 学术J. Grid Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy
Journal of Grid Computing ( IF 3.6 ) Pub Date : 2020-08-25 , DOI: 10.1007/s10723-020-09522-2
Alemeh Matani , Hamid Reza Naji , Hassan Motallebi

In scheduling workflows in grid environment, concerns such as minimizing the makespan and cost, meeting the time and budget constraints and the possibility of resource failures and so on have motivated researchers to propose numerous scheduling algorithms. Several heuristics and meta-heuristic algorithms have been proposed to address these issues, each of which often only considers one or a few of these criteria. However, less attention has been paid to fault-tolerant scheduling of workflows. Adding fault-tolerance to a workflow scheduling algorithm leads to an inevitable increase in the makespan and cost. Using the resubmission technique may result to an unacceptable increase in the execution time and possible violation of deadline while the replication method increases the execution cost. In this paper, we propose a fault-tolerant workflow scheduling algorithm with near-optimal time and cost overhead. The proposed approach brings a two-fold novelty. First, we assume a stochastic model of workflow with nondeterministic task parameters and use interval arithmetic to model task execution times and propose a new scheduling algorithm in which the task assignment decisions are taken according to the performability fluctuations of the computational resources. Second, we employ an Efficient combination of resubmission and replication techniques to achieve the benefits of both and propose an algorithm for reliable scheduling of scientific workflows with near-optimal additional time and cost. The proposed method, achieves a significant increase in the reliability while the additional execution time and cost is almost negligible.



中文翻译:

具有近乎最佳冗余度的网格容错工作流调度算法

在网格环境中调度工作流中,诸如最小化制造周期和成本,满足时间和预算约束以及资源故障的可能性等问题促使研究人员提出了许多调度算法。已经提出了几种启发式算法和元启发式算法来解决这些问题,每种算法通常仅考虑这些标准中的一个或几个。但是,对工作流的容错调度的关注较少。向工作流程调度算法中添加容错功能将不可避免地增加制造周期和成本。使用重新提交技术可能导致执行时间增加到不可接受的程度,并且可能违反期限,而复制方法会增加执行成本。在本文中,我们提出了一种具有最佳时间和成本开销的容错工作流调度算法。所提出的方法具有两个新颖性。首先,我们假设工作流具有不确定的任务参数的随机模型,并使用区间算法对任务执行时间进行建模,并提出了一种新的调度算法,其中根据计算资源的性能波动来做出任务分配决策。其次,我们采用重新提交和复制技术的有效组合来实现两者的好处,并提出了一种用于科学工作流程的可靠调度的算法,并具有接近最佳的额外时间和成本。所提出的方法显着提高了可靠性,而额外的执行时间和成本几乎可以忽略不计。

更新日期:2020-08-25
down
wechat
bug