当前位置: X-MOL 学术ETRI J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining replication and checkpointing redundancies for reducing resiliency overhead
ETRI Journal ( IF 1.4 ) Pub Date : 2020-04-15 , DOI: 10.4218/etrij.2018-0684
Hassan Motallebi 1
Affiliation  

We herein propose a heuristic redundancy selection algorithm that combines resubmission, replication, and checkpointing redundancies to reduce the resiliency overhead in fault‐tolerant workflow scheduling. The appropriate combination of these redundancies for workflow tasks is obtained in two consecutive phases. First, to compute the replication vector (number of task replicas), we apportion the set of provisioned resources among concurrently executing tasks according to their needs. Subsequently, we obtain the optimal checkpointing interval for each task as a function of the number of replicas and characteristics of tasks and computational environment. We formulate the problem of obtaining the optimal checkpointing interval for replicated tasks in situations where checkpoint files can be exchanged among computational resources. The results of our simulation experiments, on both randomly generated workflow graphs and real‐world applications, demonstrated that both the proposed replication vector computation algorithm and the proposed checkpointing scheme reduced the resiliency overhead.

中文翻译:

组合复制和检查点冗余以减少弹性开销

我们在此提出一种启发式冗余选择算法,该算法结合了重新提交,复制和检查点冗余,以减少容错工作流调度中的弹性开销。这些冗余用于工作流任务的适当组合在两个连续的阶段中获得。首先,为了计算复制向量(任务副本的数量),我们根据并发执行的任务的需求在预执行的任务之间分配资源的集合。随后,我们获得每个任务的最佳检查点间隔,该间隔取决于副本数以及任务和计算环境的特征。我们提出了在可以在计算资源之间交换检查点文件的情况下获得复制任务的最佳检查点间隔的问题。
更新日期:2020-04-15
down
wechat
bug