当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analytics of Longitudinal System Monitoring Data for Performance Prediction
arXiv - CS - Performance Pub Date : 2020-07-07 , DOI: arxiv-2007.03451
Ian J. Costello, Abhinav Bhatele

In recent years, several HPC facilities have started continuous monitoring of their systems and jobs to collect performance-related data for understanding performance and operational efficiency. Such data can be used to optimize the performance of individual jobs and the overall system by creating data-driven models that can predict the performance of pending jobs. In this paper, we model the performance of representative control jobs using longitudinal system-wide monitoring data to explore the causes of performance variability. Using machine learning, we are able to predict the performance of unseen jobs before they are executed based on the current system state. We analyze these prediction models in great detail to identify the features that are dominant predictors of performance. We demonstrate that such models can be application-agnostic and can be used for predicting performance of applications that are not included in training.

中文翻译:

分析纵向系统监控数据以进行性能预测

近年来,一些 HPC 设施已开始对其系统和作业进行持续监控,以收集与性能相关的数据,以了解性能和运营效率。通过创建可以预测待处理作业性能的数据驱动模型,此类数据可用于优化单个作业和整个系统的性能。在本文中,我们使用纵向系统范围的监控数据对代表性控制作业的性能进行建模,以探索性能可变性的原因。使用机器学习,我们能够根据当前系统状态在未见过的作业执行之前预测它们的性能。我们详细分析了这些预测模型,以确定作为性能主要预测因素的特征。
更新日期:2020-07-08
down
wechat
bug