当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance
Journal of Computer Science and Technology ( IF 1.2 ) Pub Date : 2020-01-01 , DOI: 10.1007/s11390-020-9798-5
Qi Chen , Kang Chen , Zuo-Ning Chen , Wei Xue , Xu Ji , Bin Yang

It is hard for applications to make full utilization of the peak bandwidth of the storage system in highperformance computers because of I/O interferences, storage resource misallocations and complex long I/O paths. We performed several studies to bridge this gap in the Sunway storage system, which serves the supercomputer Sunway TaihuLight. To locate these issues and connections between them, an end-to-end performance monitoring and diagnosis tool was developed to understand I/O behaviors of applications and the system. With the help of the tool, we were about to find out the root causes of such performance barriers at the I/O forwarding layer and the parallel file system layer. An application-aware I/O forwarding allocation framework was used to address the I/O interferences and resource misallocations at the I/O forwarding layer. A performance-aware data placement mechanism was proposed to mitigate the impact of I/O interferences and performance variations of storage devices in the PFS. Together, applications obtained much better I/O performance. During the process, we also proposed a lightweight storage stack to shorten the I/O path of applications with -N I/O pattern. This paper summarizes these studies and presents the lessons learned from the process.

中文翻译:

从优化 Sunway 存储系统以获得更高应用程序 I/O 性能的经验教训

由于I/O干扰、存储资源分配错误和复杂的长I/O路径,应用程序难以充分利用高性能计算机中存储系统的峰值带宽。我们进行了多项研究,以弥补为超级计算机 Sunway TaihuLight 提供服务的 Sunway 存储系统中的这一差距。为了定位这些问题以及它们之间的联系,开发了端到端的性能监控和诊断工具来了解应用程序和系统的 I/O 行为。借助该工具,我们即将在I/O转发层和并行文件系统层找出造成此类性能障碍的根本原因。应用感知 I/O 转发分配框架用于解决 I/O 转发层的 I/O 干扰和资源分配错误。提出了一种性能感知数据放置机制,以减轻 PFS 中存储设备的 I/O 干扰和性能变化的影响。总之,应用程序获得了更好的 I/O 性能。在此过程中,我们还提出了一个轻量级的存储堆栈,以使用-NI/O模式缩短应用程序的I/O路径。本文总结了这些研究并介绍了从该过程中吸取的经验教训。
更新日期:2020-01-01
down
wechat
bug