当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Intelligent colocation of server workloads
Journal of Parallel and Distributed Computing ( IF 3.4 ) Pub Date : 2021-02-15 , DOI: 10.1016/j.jpdc.2021.02.010
Felippe Vieira Zacarias , Vinicius Petrucci , Rajiv Nishtala , Paul Carpenter , Daniel Mossé

Many server applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though the remaining resources may be underutilized. It is hard for developers and runtime systems to ensure that all critical resources are fully exploited by a single application, so an attractive technique for increasing server system utilization is to colocate multiple applications on the same server. When applications share critical resources, however, contention on shared resources may lead to reduced application performance.

In this paper, we show that server efficiency can be improved by first modeling the expected performance degradation of colocated applications based on measured hardware performance counters, and then exploiting the model to determine an optimized mix of colocated applications. This paper presents a new intelligent resource manager and makes the following contributions: (1) a new machine learning model to predict the performance degradation of colocated applications based on hardware counters and (2) an intelligent scheduling scheme deployed on an existing resource manager to enable application co-scheduling with minimum performance degradation. Our results show that our approach achieves performance improvements of 7 % (avg) and 12 % (max) compared to the standard policy commonly used by existing job managers.



中文翻译:

服务器工作负载的智能托管

即使剩余的资源可能未得到充分利用,许多服务器应用程序也会遇到共享缓存,指令执行单元,I / O或内存带宽方面的瓶颈。开发人员和运行时系统很难确保单个应用程序可以充分利用所有关键资源,因此提高服务器系统利用率的一种有吸引力的技术是将多个应用程序并置在同一服务器上。但是,当应用程序共享关键资源时,对共享资源的争用可能会导致应用程序性能下降。

在本文中,我们表明可以通过首先基于测量的硬件性能计数器对托管应用程序的预期性能下降进行建模,然后利用该模型确定托管应用程序的优化组合来提高服务器效率。本文介绍了一种新的智能资源管理器,并做出了以下贡献:(1)一种新的机器学习模型,用于基于硬件计数器预测共置应用程序的性能下降;(2)在现有资源管理器上部署的智能调度方案以实现与应用程序协同调度,而性能降到最低。我们的结果表明,与现有工作经理通常使用的标准策略相比,我们的方法可将性能提高7  %(平均)和12  %(最大)。

更新日期:2021-02-15
down
wechat
bug