当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling I/O performance variability in high-performance computing systems using mixture distributions
Journal of Parallel and Distributed Computing ( IF 3.4 ) Pub Date : 2020-02-08 , DOI: 10.1016/j.jpdc.2020.01.005
Li Xu , Yueyao Wang , Thomas Lux , Tyler Chang , Jon Bernard , Bo Li , Yili Hong , Kirk Cameron , Layne Watson

Performance variability is an important factor of high-performance computing (HPC) systems. HPC performance variability is often complex because its sources interact and are distributed throughout the system stack. For example, the performance variability of I/O throughput can be affected by factors such as CPU frequency, the number of I/O threads, file size, and record size. In this paper, we focus on the I/O throughput variability across multiple executions of a benchmark program. For a given system configuration, the distribution of throughputs from run to run is of interest. We conduct large-scale experiments and collect a massive amount of data to study the distribution of I/O throughput under tens of thousands of system configurations. Despite normality often being assumed in the literature, our statistical analysis reveals that the performance variability is not normally distributed under most system configurations. Instead, multimodal distributions are common for many system configurations. We propose the use of mixture distributions to describe the multimodal behavior. Various underlying parametric distributions such as normal, gamma, and the Weibull are considered. We apply an expectation maximization (EM) algorithm for parameter estimation and use the Bayesian information criterion (BIC) for parametric model selections. We also illustrate how to use the estimated mixture distribution to calculate the number of runs needed for future experiments on variability analysis. The paper provides a useful tool set in studying the behavior of performance variability.



中文翻译:

使用混合分布在高性能计算系统中对I / O性能变化进行建模

性能可变性是高性能计算(HPC)系统的重要因素。HPC性能可变性通常很复杂,因为其来源相互影响并分布在整个系统堆栈中。例如,I / O吞吐量的性能可变性可能受诸如CPU频率,I / O线程数,文件大小和记录大小之类的因素影响。在本文中,我们专注于基准程序多次执行中的I / O吞吐量差异。对于给定的系统配置,每次运行的吞吐量分布是有意义的。我们进行大规模实验并收集大量数据,以研究成千上万个系统配置下I / O吞吐量的分布。尽管通常在文献中假设正常,我们的统计分析表明,在大多数系统配置下,性能差异并不是正态分布的。取而代之的是,多模式分布对于许多系统配置都是常见的。我们建议使用混合物分布来描述多峰行为。考虑各种基础参数分布,例如正态,伽玛和威布尔。我们将期望最大化(EM)算法应用于参数估计,并将贝叶斯信息准则(BIC)用于参数模型选择。我们还将说明如何使用估计的混合物分布来计算未来的可变性分析实验所需的运行次数。本文为研究性能变异行为提供了有用的工具集。

更新日期:2020-02-10
down
wechat
bug