Workflow scheduling based on deep reinforcement learning in the cloud environment,Journal of Ambient Intelligence and Humanized Computing

当前位置： X-MOL 学术 › J. Ambient Intell. Human. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Workflow scheduling based on deep reinforcement learning in the cloud environment
Journal of Ambient Intelligence and Humanized Computing Pub Date : 2021-01-09 , DOI: 10.1007/s12652-020-02884-1
Tingting Dong , Fei Xue , Chuangbai Xiao , Jiangjiang Zhang

As a convenient and economic computing model, cloud computing promotes the development of intelligence. Solving the workflow scheduling is a significant topic to promote the development of the cloud computing. In this work, an Actor-Critic architecture is utilized to solve this problem achieving the task executive time minimization under the task precedence constraint. It is similar to the list-based heuristic algorithm which includes the task prioritizing phase and task allocation phase. However, the results of the two phases interact with each other. In the task prioritizing phase, given a workflow represented as the data communication time matrix and task computation time matrix, a distribution over different task permutations by the improved Pointer network can be predicted. Then, the heuristic algorithm based on the HEFT achieves the task allocation to get the task executive time. Using negative task executive time as the reward signals, the model parameters by a policy gradient method in the first phase can be optimized. The simulation experiment is done from the task executive time, and the results shows that the workflow scheduling by the deep reinforcement learning is more effective comparing with other four single objective heuristic algorithms.

中文翻译：

云环境中基于深度强化学习的工作流调度

作为一种方便，经济的计算模型，云计算促进了智能的发展。解决工作流调度是促进云计算发展的重要课题。在这项工作中，Actor-Critic体系结构用于解决此问题，从而在任务优先级约束下实现了任务执行时间最小化。它与基于列表的启发式算法相似，后者包括任务优先级划分阶段和任务分配阶段。但是，两个阶段的结果相互影响。在任务优先级阶段，给定工作流表示为数据通信时间矩阵和任务计算时间矩阵，可以预测改进的Pointer网络在不同任务置换上的分布。然后，基于HEFT的启发式算法实现了任务分配，以获取任务执行时间。使用负任务执行时间作为奖励信号，可以通过第一阶段的策略梯度方法优化模型参数。从任务执行时间开始进行仿真实验，结果表明，与其他四种单目标启发式算法相比，深度强化学习的工作流调度更为有效。

更新日期：2021-01-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11