当前位置: X-MOL 学术Lobachevskii J. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Why do Users Need to Take Care of Their HPC Applications Efficiency?
Lobachevskii Journal of Mathematics Pub Date : 2020-10-21 , DOI: 10.1134/s1995080220080132
D. A. Nikitenko , P. A. Shvets , V. V. Voevodin

Abstract

High-performance computing takes a very important place in modern scientific research process. And since all scientists want to solve their problems faster, it is very important to speed up these computations. For these purposes, new algorithms are being developed, new HPC systems appear, etc. However, quite little attention is paid to the efficiency of high-performance computations, which often leads to a vast amount of supercomputer resources being idle. It is vital to change this situation; in particular, it is necessary to show users the importance and necessity of optimizing their applications. One of the main steps in this direction is to help users detect performance issues in their programs, analyze their level of criticality as well as root causes, and eliminate them in order to improve application performance. In this article we describe the research being performed at the Lomonosov Moscow State University aimed at solving this problem. In particular, we analyze the results of supercomputer center users survey, showing their opinion on the efficiency analysis. We also share our vision on the HPC center workflow requirements to support system and applications efficiency analysis. After that, we describe a software tool being developed that allows any supercomputer user to obtain and analyze versatile statistics on performance of his HPC jobs, helping him to detect possible root causes of performance degradation.



中文翻译:

用户为什么需要注意其HPC应用程序效率?

摘要

高性能计算在现代科学研究过程中占有非常重要的地位。而且由于所有科学家都希望更快地解决他们的问题,因此加快这些计算的速度非常重要。为了这些目的,正在开发新的算法,出现新的HPC系统,等等。但是,很少关注高性能计算的效率,这常常导致大量的超级计算机资源处于空闲状态。改变这种状况至关重要。特别是,有必要向用户展示优化其应用程序的重要性和必要性。朝这个方向迈出的主要步骤之一是帮助用户检测程序中的性能问题,分析其严重程度以及根本原因并消除它们,以提高应用程序性能。在本文中,我们描述了罗蒙诺索夫莫斯科国立大学正在进行的旨在解决该问题的研究。特别是,我们分析了超级计算机中心用户的调查结果,表明了他们对效率分析的看法。我们还对HPC中心工作流程要求达成共识,以支持系统和应用程序效率分析。之后,我们介绍一种正在开发的软件工具,该工具可以使任何超级计算机用户都可以获取和分析有关其HPC作业性能的通用统计信息,从而帮助他检测出性能下降的可能根本原因。我们还对HPC中心工作流程要求达成共识,以支持系统和应用程序效率分析。之后,我们介绍一种正在开发的软件工具,该工具可以使任何超级计算机用户都可以获取和分析有关其HPC作业性能的通用统计信息,从而帮助他检测出性能下降的可能根本原因。我们还对HPC中心工作流程要求达成共识,以支持系统和应用程序效率分析。之后,我们介绍一种正在开发的软件工具,该工具可以使任何超级计算机用户都可以获取和分析有关其HPC作业性能的通用统计信息,从而帮助他检测出性能下降的可能根本原因。

更新日期:2020-10-30
down
wechat
bug