当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Q-learning approach for the autoscaling of scientific workflows in the Cloud
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2021-09-17 , DOI: 10.1016/j.future.2021.09.007
Yisel Garí 1 , David A. Monge 1 , Cristian Mateos 2
Affiliation  

Autoscaling strategies aim to exploit the elasticity, resource heterogeneity and varied prices options of a Cloud infrastructure to improve efficiency in the execution of resource-hungry applications such as scientific workflows. Scientific workflows represent a special type of Cloud application with task dependencies, high-performance computational requirements and fluctuating workloads. Hence, the amount and type of resources needed during workflow execution changes dynamically over time. The well-known autoscaling problem comprises (i) scaling decisions, for adjusting the computing capacity of a virtualized infrastructure to meet the current demand of the application and (ii) task scheduling decisions, for assigning tasks to specific acquired Cloud resources for execution. Both are highly complex sub-problems, even more because of the uncertainty inherent to the Cloud. Reinforcement Learning (RL) provides a solid framework for decision-making problems in stochastic environments. Therefore, RL offers a promising perspective for designing Cloud autoscaling strategies based on an online learning process. In this work, we propose a novel formulation for the problem of infrastructure scaling in the Cloud as a Markov Decision Process, and we use the Q-learning algorithm for learning scaling policies, while demonstrating that considering the specific characteristics of workflow applications when taking autoscaling decisions can lead to more efficient workflow executions. Thus, our RL-based scaling strategy exploits the information available about workflow dependency structures. Simulations performed on four well-known workflows demonstrate significant gains (25%–55%) of our proposal in comparison with a similar state-of-the-art proposal.



中文翻译:

一种用于在云中自动缩放科学工作流的 Q 学习方法

自动扩展策略旨在利用云基础设施的弹性、资源异构性和不同的价格选项,以提高资源匮乏应用程序(如科学工作流)的执行效率。科学工作流代表一种特殊类型的云应用程序,具有任务依赖性、高性能计算要求和波动的工作负载。因此,工作流执行期间所需资源的数量和类型随时间动态变化。众所周知的自动缩放问题包括(i)缩放决策,用于调整虚拟化基础架构的计算能力以满足应用程序的当前需求和(ii)任务调度决策,用于将任务分配给特定获取的云资源执行。两者都是高度复杂的子问题,甚至更多是因为云固有的不确定性。强化学习 (RL) 为随机环境中的决策问题提供了坚实的框架。因此,RL 为基于在线学习过程设计云自动扩展策略提供了一个有前景的视角。在这项工作中,我们为云中的基础设施扩展问题提出了一种新的公式作为马尔可夫决策过程,我们使用 Q-learning 算法来学习扩展策略,同时证明在采用自动扩展时考虑工作流应用程序的特定特性决策可以导致更有效的工作流程执行。因此,我们基于 RL 的扩展策略利用了有关工作流依赖结构的可用信息。在四个众所周知的工作流程上进行的模拟表明,与类似的最先进的提案相比,我们的提案有显着的收益(25%–55%)。

更新日期:2021-09-23
down
wechat
bug