当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2021-03-10 , DOI: 10.1007/s11227-020-03541-2
Zain Ulabedin , Babar Nazir

Scientific workflow applications have a large amount of tasks and data sets to be processed in a systematic manner. These applications benefit from cloud computing platform that offer access to virtually limitless resources provisioned elastically and on demand. Running data-intensive scientific workflow on geographically distributed data centres faces massive amount of data transfer. That affects the whole execution time and monitory cost of scientific workflows. The existing efforts on scheduling workflow concentrate on decreasing make span and budget; little concern has been paid to contemplate tasks and data sets dependency. In this paper, we introduced workflow scheduling technique to overcome data transfer and execute workflow tasks within deadline and budget constraints. The proposed techniques consist of initial data placement stage, which clusters and distributes datasets based on their dependence and replication-based partial critical path (R-PCP) technique which schedules tasks with data locality and dynamically maintains dependency matrix for the placement of generated data sets. To reduce run time datasets movement, we use interdata centre tasks replication and data sets replication to make sure data sets availability. Simulation results with four workflow applications illustrate that our strategy efficiently reduces data movement and executes all chosen workflows within user specified budget and deadline. Results reveal that R-PCP has 44.93% and 31.37% less data movement compared to random and adaptive data-aware scheduling (ADAS) techniques, respectively. R-PCP has 26.48% less energy consumption compared with ADAS technique.



中文翻译:

基于复制和数据管理的多云数据中心平台工作流调度算法

科学的工作流应用程序具有大量的任务和数据集,需要以系统的方式进行处理。这些应用程序受益于云计算平台,该平台提供了对几乎无限的资源的访问,这些资源是按需弹性设置的。在地理上分散的数据中心上运行数据密集型科学工作流面临着大量的数据传输。这会影响科学工作流程的整个执行时间和监控成本。调度工作流程的现有工作集中在减少制造时间和预算上。很少考虑考虑任务和数据集的依赖性。在本文中,我们介绍了工作流调度技术,以克服数据传输和在截止日期和预算限制内执行工作流任务的情况。拟议的技术包括初始数据放置阶段,它基于数据集的依赖性和基于复制的部分关键路径(R-PCP)技术对数据集进行聚类和分配,该技术可调度具有数据局部性的任务并动态维护用于生成数据集放置的依赖性矩阵。为了减少运行时数据集的移动,我们使用数据中心间任务复制和数据集复制来确保数据集的可用性。四个工作流程应用程序的仿真结果表明,我们的策略有效地减少了数据移动并在用户指定的预算和期限内执行了所有选定的工作流程。结果表明,与随机和自适应数据感知调度(ADAS)技术相比,R-PCP的数据移动量分别减少了44.93%和31.37%。与ADAS技术相比,R-PCP的能耗降低了26.48%。

更新日期:2021-03-10
down
wechat
bug