A Data and Task Co-scheduling Algorithm for Scientific Cloud Workflows,IEEE Transactions on Cloud Computing

当前位置： X-MOL 学术 › IEEE Trans. Cloud Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Data and Task Co-scheduling Algorithm for Scientific Cloud Workflows
IEEE Transactions on Cloud Computing ( IF 5.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tcc.2015.2511745
Kefeng Deng , Kaijun Ren , Min Zhu , Junqiang Song

Cloud computing has emerged as a promising computational infrastructure for cost-efficient workflow execution by provisioning on-demand resources in a pay-as-you-go manner. While scientific workflows require accessing community-wide resources, they usually need to be performed in collaborative cloud environments composed of multiple datacenters. Although such environments facilitate scientific collaboration, the movements of input and intermediate datasets across geographically distributed datacenters may cause intolerable latency that would hinder efficient execution of large-scale data-intensive scientific workflows. To address the problem, in this article we propose a novel multi-level K-cut graph partitioning algorithm to minimize the volume of data transfer across datacenters while satisfying load balancing and fixed data constraints. The algorithm first contracts the fixed input datasets in the same datacenter and their consuming tasks, and coarsens the contracted graph to a predefined scale in a level-by-level manner. Then, a K-cut algorithm is used to partition the resulted graph into K parts such that the cut size is minimized. After that, the partitioned graph is projected back to the original workflow graph, during which the load balancing constraint is maintained. We evaluate our algorithm using three real-world workflow applications and the results demonstrate that the proposed algorithm outperforms other state-of-the-art algorithms.

中文翻译：

一种用于科学云工作流的数据和任务协同调度算法

通过以即用即付的方式提供按需资源，云计算已成为具有成本效益的工作流执行的有前途的计算基础设施。虽然科学工作流需要访问社区范围的资源，但它们通常需要在由多个数据中心组成的协作云环境中执行。尽管这样的环境促进了科学协作，但跨地理分布的数据中心的输入和中间数据集的移动可能会导致无法忍受的延迟，从而阻碍大规模数据密集型科学工作流的有效执行。为了解决这个问题，在本文中，我们提出了一种新颖的多级 K-cut 图分区算法，以最小化跨数据中心的数据传输量，同时满足负载平衡和固定数据约束。该算法首先将同一数据中心的固定输入数据集及其消费任务收缩，并以逐级的方式将收缩的图粗化到预定义的规模。然后，使用 K-cut 算法将结果图划分为 K 个部分，以便最小化切割大小。之后，将分区图投影回原始工作流图，在此期间保持负载平衡约束。我们使用三个真实世界的工作流应用程序评估我们的算法，结果表明所提出的算法优于其他最先进的算法。使用 K-cut 算法将结果图划分为 K 个部分，以便最小化切割大小。之后，将分区图投影回原始工作流图，在此期间保持负载平衡约束。我们使用三个真实世界的工作流应用程序评估我们的算法，结果表明所提出的算法优于其他最先进的算法。使用 K-cut 算法将结果图划分为 K 个部分，以便最小化切割大小。之后，将分区图投影回原始工作流图，在此期间保持负载平衡约束。我们使用三个真实世界的工作流应用程序评估我们的算法，结果表明所提出的算法优于其他最先进的算法。

更新日期：2020-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11