当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2020-01-01 , DOI: 10.1007/s11390-020-9732-x
Sa Wang , Yan-Hai Zhu , Shan-Pei Chen , Tian-Ze Wu , Wen-Jie Li , Xu-Sheng Zhan , Hai-Yang Ding , Wei-Song Shi , Yun-Gang Bao

Both resource efficiency and application QoS have been big concerns of datacenter operators for a long time, but remain to be irreconcilable. High resource utilization increases the risk of resource contention between co-located workload, which makes latency-critical (LC) applications suffer unpredictable, and even unacceptable performance. Plenty of prior work devotes the effort on exploiting effective mechanisms to protect the QoS of LC applications while improving resource efficiency. In this paper, we propose MAGI, a resource management runtime that leverages neural networks to monitor and further pinpoint the root cause of performance interference, and adjusts resource shares of corresponding applications to ensure the QoS of LC applications. MAGI is a practice in Alibaba datacenter to provide on-demand resource adjustment for applications using neural networks. The experimental results show that MAGI could reduce up to 87.3% performance degradation of LC application when co-located with other antagonist applications.

中文翻译:

基于神经网络的阿里巴巴数据中心自适应资源管理案例

长期以来,资源效率和应用服务质量一直是数据中心运营商关注的焦点,但仍然是不可调和的。高资源利用率增加了协同定位工作负载之间资源争用的风险,这使得延迟关键 (LC) 应用程序遭受不可预测甚至无法接受的性能。许多先前的工作致力于开发有效的机制来保护 LC 应用程序的 QoS,同时提高资源效率。在本文中,我们提出了一种资源管理运行时 MAGI,它利用神经网络来监控并进一步查明性能干扰的根本原因,并调整相应应用程序的资源份额,以确保 LC 应用程序的 QoS。MAGI是阿里巴巴数据中心的一个实践,为使用神经网络的应用提供按需资源调整。实验结果表明,当与其他拮抗剂应用程序共存时,MAGI 可以减少高达 87.3% 的 LC 应用程序的性能下降。
更新日期:2020-01-01
down
wechat
bug