当前位置: X-MOL 学术Parallel Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improved probabilistic I/O scheduling for limited-size Burst-Buffers deployed HPC
Parallel Computing ( IF 2.0 ) Pub Date : 2020-10-25 , DOI: 10.1016/j.parco.2020.102708
Benbo Zha , Hong Shen

I/O bottleneck is a critical problem in current High Performance Computing (HPC) systems which hinges the performance scalability of a system. Some techniques, such as I/O scheduling and Burst-Buffering, had been proposed to accelerate data exchange between the compute and storage components on HPC platforms. Probabilistic I/O scheduling, a Markov-chain-based hybrid method combined the above-mentioned two techniques, controls the data transmission considering the whole load states of the Burst-Buffers system to mitigate the I/O congestion caused by unpredictable concurrent I/O bursts. However, this method requires a large amount of computation to make online scheduling, resulting in significant wastage of computing resources and decreased efficiency in scheduling. In this paper, we first introduce the architecture of Burst-Buffers deployed HPC platform, the probabilistic execution model of applications, and the basic probabilistic I/O scheduling method with a proof of its efficiency based on the Markov-chain framework. Then, we propose the modularization technique, as the first improvement, to reduce the repeated computation by isolating the heuristic application selection module from the original method and reusing the application ranking result to adjust the I/O scheduling. Next, we propose the thresholding technique, as the second improvement, to reduce the number of data transferring on burst-buffers by considering the write amplification characteristic of the underlying storage devices. Finally, we conduct extensive simulation experiments to show that our proposed I/O scheduling methods outperform the existing I/O scheduling methods without introducing burst-buffers states and without considering the characteristics of storage devices.



中文翻译:

针对有限大小的突发缓冲区部署的HPC改进了概率I / O调度

I / O瓶颈是当前高性能计算(HPC)系统中的一个关键问题,它关系到系统的性能可伸缩性。已经提出了一些技术,例如I / O调度和突发缓冲,以加速HPC平台上的计算和存储组件之间的数据交换。概率I / O调度是一种基于马尔可夫链的混合方法,结合了上述两种技术,在考虑Burst-Buffers系统的整个负载状态的情况下控制数据传输,以缓解不可预测的并发I / O导致的I / O拥塞。 burst。但是,该方法需要大量的计算才能进行在线调度,从而导致计算资源的大量浪费和调度效率的降低。在本文中,我们首先介绍部署了Burst-Buffers的HPC平台的体系结构,应用程序的概率执行模型以及基本的概率I / O调度方法,并基于Markov链框架证明其效率。然后,我们提出了模块化技术,作为第一个改进,它是通过将启发式应用程序选择模块与原始方法隔离开来,并重用应用程序排名结果来调整I / O调度,从而减少重复计算。接下来,我们提出阈值技术,作为第二个改进,它通过考虑基础存储设备的写放大特性来减少在突发缓冲区上传输的数据数量。最后,

更新日期:2020-11-13
down
wechat
bug