当前位置: X-MOL 学术IEEE Trans. Serv. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Making Neighbors Quiet: An Approach to Detect Virtual Resource Contention
IEEE Transactions on Services Computing ( IF 8.1 ) Pub Date : 2020-09-01 , DOI: 10.1109/tsc.2017.2720742
Joel Vallone , Robert Birke , Lydia Y. Chen

It is imperative for public cloud providers to guarantee performance targets for tenants’ virtual machines (VMs) while respecting strict business confidentiality, e.g., having no information on applications nor their performance. A large body of related work addresses the challenges of detecting performance interferences by leveraging client's quality of service (QoS) metrics, e.g., latency, and additional profiling servers. In this paper, we take the perspective of the cloud provider and propose a general black-box approach that detects different resource contentions by throttling neighboring VMs. Specifically, we design a three-phase detection algorithm that includes: (i) an alarm phase to identify statistical outliers using control charts; (ii) a passive clustering phase to match the current sample to historical behaviors; and (iii) an active throttling phase to discern contentions from application phase changes via throttling. The algorithm is specifically designed for scenarios where multiple co-located VMs request detection analysis simultaneously. We implement and evaluate the proposed three-phase algorithm on four latency sensitive applications, i.e., Wikimedia and three benchmarks from Cloudsuite. Our extensive experimental results show that we can reach an average detection accuracy above 90 percent while limiting the performance degradation experienced by offender workloads to short learning phases.

中文翻译:

让邻居安静:一种检测虚拟资​​源争用的方法

公共云提供商必须保证租户虚拟机 (VM) 的性能目标,同时尊重严格的商业机密,例如,不提供有关应用程序及其性能的信息。大量相关工作通过利用客户端的服务质量 (QoS) 指标(例如延迟和额外的分析服务器)来解决检测性能干扰的挑战。在本文中,我们从云提供商的角度提出了一种通用的黑盒方法,通过限制相邻虚拟机来检测不同的资源争用。具体来说,我们设计了一个三阶段检测算法,其中包括:(i) 使用控制图识别统计异常值的警报阶段;(ii) 被动聚类阶段,将当前样本与历史行为相匹配;(iii) 一个主动的节流阶段,通过节流从应用程序阶段的变化中辨别争用。该算法专为多个位于同一地点的虚拟机同时请求检测分析的场景而设计。我们在四个延迟敏感的应用程序(即维基媒体和 Cloudsuite 的三个基准测试)上实施和评估所提出的三阶段算法。我们广泛的实验结果表明,我们可以达到 90% 以上的平均检测准确率,同时将违规工作负载所经历的性能下降限制在短期学习阶段。我们在四个延迟敏感的应用程序(即维基媒体和 Cloudsuite 的三个基准测试)上实施和评估所提出的三阶段算法。我们广泛的实验结果表明,我们可以达到 90% 以上的平均检测准确率,同时将违规工作负载所经历的性能下降限制在短期学习阶段。我们在四个延迟敏感的应用程序(即维基媒体和 Cloudsuite 的三个基准测试)上实施和评估所提出的三阶段算法。我们广泛的实验结果表明,我们可以达到 90% 以上的平均检测准确率,同时将违规工作负载所经历的性能下降限制在短期学习阶段。
更新日期:2020-09-01
down
wechat
bug