当前位置: X-MOL 学术Queueing Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Admit or preserve? Addressing server failures in cloud computing task management
Queueing Systems ( IF 1.2 ) Pub Date : 2019-07-17 , DOI: 10.1007/s11134-019-09624-z
Nadav Lavi , Hanoch Levy

Cloud computing task management has a critical role in the efficient operation of the cloud resources, i.e., the servers. The task management handles critical and complicated decisions, overcoming the inherent dynamic nature of cloud computing systems and the additional complexity due to the large magnitude of resources in such systems (tens of thousands of servers). Due to the fact that servers may fail, task management is required to conduct both task admissions and task preservation decisions. Moreover, both these decisions require considering future system trajectories and the interplay between preservation and admission. In this paper we study the combined problem of task admission and preservation in a dynamic environment of cloud computing systems through analysis of a queueing system based on a Markov decision process (MDP). We show that the optimal operational policy is of a double switching curve type. On face value, the extraction of the optimal policy is rather complicated, yet our analysis reveals that the optimal policy can be reduced to a single rule, since the rules can effectively be decoupled. Based on this result, we propose two heuristic approaches that approximate the optimal rule for the most relevant system settings in cloud computing systems. Our results provide a simple policy scheme for the combined admission and preservation problem that can be applied in a complex cloud computing environments, and eliminate the need for sophisticated real-time control mechanisms.

中文翻译:

承认还是保留?解决云计算任务管理中的服务器故障

云计算任务管理对于云资源(即服务器)的高效运行起着至关重要的作用。任务管理处理关键和复杂的决策,克服了云计算系统固有的动态特性以及由于此类系统中的大量资源(数万台服务器)而带来的额外复杂性。由于服务器可能会发生故障,因此需要任务管理来进行任务准入和任务保留决策。此外,这两个决定都需要考虑未来的系统轨迹以及保存和接纳之间的相互作用。在本文中,我们通过对基于马尔可夫决策过程(MDP)的排队系统的分析,研究了云计算系统动态环境中任务接纳和保存的组合问题。我们表明最优操作策略是双切换曲线类型。从表面上看,最优策略的提取相当复杂,但我们的分析表明,最优策略可以简化为单个规则,因为规则可以有效地解耦。基于此结果,我们提出了两种启发式方法,它们近似于云计算系统中最相关系统设置的最佳规则。我们的结果为可应用于复杂云计算环境的联合准入和保存问题提供了一个简单的策略方案,并消除了对复杂实时控制机制的需求。然而我们的分析表明,最优策略可以简化为单个规则,因为这些规则可以有效地解耦。基于此结果,我们提出了两种启发式方法,它们近似于云计算系统中最相关系统设置的最佳规则。我们的结果为可应用于复杂云计算环境的联合准入和保存问题提供了一个简单的策略方案,并消除了对复杂实时控制机制的需要。然而我们的分析表明,最优策略可以简化为单个规则,因为这些规则可以有效地解耦。基于此结果,我们提出了两种启发式方法,它们近似于云计算系统中最相关系统设置的最佳规则。我们的结果为可应用于复杂云计算环境的联合准入和保存问题提供了一个简单的策略方案,并消除了对复杂实时控制机制的需要。
更新日期:2019-07-17
down
wechat
bug