当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An empirical study of I/O separation for burst buffers in HPC systems
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.jpdc.2020.10.007
Donghun Koo , Jaehwan Lee , Jialin Liu , Eun-Kyu Byun , Jae-Hyuck Kwak , Glenn K. Lockwood , Soonwook Hwang , Katie Antypas , Kesheng Wu , Hyeonsang Eom

To meet the exascale I/O requirements for the High-Performance Computing (HPC), a new I/O subsystem, Burst Buffer, based on solid state drives (SSD), has been developed. However, the diverse HPC workloads and the bursty I/O pattern cause severe data fragmentation that requires costly garbage collection (GC) and increases the number of bytes written to the SSD. To address this data fragmentation challenge, a new multi-stream feature has been developed for SSDs. In this work, we develop an I/O Separation scheme called BIOS to leverage this multi-stream feature to group the I/O streams based on the user IDs. We propose a stream-aware scheduling policy based on burst buffer pools in the workload manager, and integrate the BIOS with the workload manager to optimize the I/O separation scheme in burst buffer. We evaluate the proposed framework with a burst buffer I/O traces from Cori Supercomputer including a diverse set of applications. Experimental results show that the BIOS could improve the performance by 1.44x on average and reduce the Write Amplification Factor (WAF) by up to 1.20x. These demonstrate the potential benefits of the I/O separation scheme for solid state storage systems.



中文翻译:

HPC系统中突发缓冲区的I / O分离的实证研究

为了满足高性能计算(HPC)的亿亿次I / O要求,已经开发了基于固态驱动器(SSD)的新I / O子系统Burst Buffer。但是,各种HPC工作负载和突发性的I / O模式会导致严重的数据碎片化,这需要昂贵的垃圾回收(GC)并增加了写入SSD的字节数。为了应对这一数据碎片挑战,已经为SSD开发了一种新的多流功能。在这项工作中,我们开发了一种称为BIOS的I / O分离方案,以利用此多流功能根据用户ID对I / O流进行分组。我们在工作负载管理器中提出了基于突发缓冲池的流感知调度策略,并将BIOS与工作负载管理器集成在一起以优化突发缓冲区中的I / O分离方案。我们使用来自Cori超级计算机的突发缓冲区I / O跟踪评估提出的框架,其中包括各种应用程序。实验结果表明,BIOS可以将性能平均提高1.44倍,并将写放大因子(WAF)降低多达1.20倍。这些证明了I / O分离方案对固态存储系统的潜在好处。

更新日期:2020-11-16
down
wechat
bug