当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Profiles of upcoming HPC Applications and their Impact on Reservation Strategies
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-05-01 , DOI: 10.1109/tpds.2020.3039728
Ana Gainaru , Brice Goglin , Valentin Honore , Guillaume Pallez

With the expected convergence between HPC, BigData and AI, new applications with different profiles are coming to HPC infrastructures. We aim at better understanding the features and needs of these applications in order to be able to run them efficiently on HPC platforms. The approach followed is bottom-up: we study thoroughly an emerging application, Spatially Localized Atlas Network Tiles (SLANT, originating from the neuroscience community) to understand its behavior. Based on these observations, we derive a generic, yet simple, application model (namely, a linear sequence of stochastic jobs). We expect this model to be representative for a large set of upcoming applications from emerging fields that start to require the computational power of HPC clusters without fitting the typical behavior of large-scale traditional applications. In a second step, we show how one can use this generic model in a scheduling framework. Specifically we consider the problem of making reservations (both time and memory) for an execution on an HPC platform based on the application expected resource requirements. We derive solutions using the model provided by the first step of this work. We experimentally show the robustness of the model, even with very few data points or using another application, to generate the model, and provide performance gains with regards to standard and more recent approaches used in the neuroscience community.

中文翻译:

即将推出的 HPC 应用程序简介及其对预留策略的影响

随着 HPC、BigData 和 AI 之间的预期融合,具有不同配置文件的新应用程序正在进入 HPC 基础设施。我们旨在更好地了解这些应用程序的特性和需求,以便能够在 HPC 平台上高效运行它们。遵循的方法是自下而上的:我们彻底研究了一个新兴的应用程序,空间本地化的 Atlas Network Tiles(SLANT,源自神经科学界)以了解其行为。基于这些观察,我们推导出一个通用但简单的应用程序模型(即随机作业的线性序列)。我们希望这个模型能够代表来自新兴领域的大量即将到来的应用程序,这些应用程序开始需要 HPC 集群的计算能力,而不适合大规模传统应用程序的典型行为。在第二步中,我们将展示如何在调度框架中使用这种通用模型。具体来说,我们考虑了基于应用程序预期资源需求为 HPC 平台上的执行预留(时间和内存)的问题。我们使用本工作第一步提供的模型推导出解决方案。我们通过实验证明了模型的稳健性,即使数据点很少或使用其他应用程序来生成模型,并提供与神经科学界使用的标准和更新方法相关的性能提升。我们使用本工作第一步提供的模型推导出解决方案。我们通过实验证明了模型的稳健性,即使数据点很少或使用其他应用程序来生成模型,并提供与神经科学界使用的标准和更新方法相关的性能提升。我们使用本工作第一步提供的模型推导出解决方案。我们通过实验证明了模型的稳健性,即使数据点很少或使用其他应用程序来生成模型,并提供与神经科学界使用的标准和更新方法相关的性能提升。
更新日期:2021-05-01
down
wechat
bug