当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-06-09 , DOI: arxiv-2106.05108
Jonas H. Müller Korndörfer, Ahmed Eleliemy, Ali Mohammed, Florina M. Ciorba

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels. OpenMP is the most widely-used standard for expressing and exploiting the ever-increasing node-level parallelism. The scheduling options in OpenMP are insufficient to address the load imbalance that arises during the execution of multithreaded applications. The limited scheduling options in OpenMP hinder research on novel scheduling techniques which require comparison with others from the literature. This work introduces LB4OMP, an open-source dynamic load balancing library that implements successful scheduling algorithms from the literature. LB4OMP is a research infrastructure designed to spur and support present and future scheduling research, for the benefit of multithreaded applications performance. Through an extensive performance analysis campaign, we assess the effectiveness and demystify the performance of all loop scheduling techniques in the library. We show that, for numerous applications-systems pairs, the scheduling techniques in LB4OMP outperform the scheduling options in OpenMP. Node-level load balancing using LB4OMP leads to reduced cross-node load imbalance and to improved MPI+OpenMP applications performance, which is critical for Exascale computing.

中文翻译:

LB4OMP:多线程应用程序的动态负载平衡库

百亿亿级计算系统将表现出高度的分层并行性,具有数千个计算节点和每个节点数百个内核。由于在多个级别出现的负载不平衡,有效地利用分层并行性具有挑战性。OpenMP 是表达和利用不断增加的节点级并行性的最广泛使用的标准。OpenMP 中的调度选项不足以解决多线程应用程序执行期间出现的负载不平衡问题。OpenMP 中有限的调度选项阻碍了对需要与文献中的其他调度技术进行比较的新型调度技术的研究。这项工作介绍了 LB4OMP,这是一个开源动态负载平衡库,它实现了文献中成功的调度算法。LB4OMP 是一个研究基础设施,旨在促进和支持当前和未来的调度研究,以提高多线程应用程序的性能。通过广泛的性能分析活动,我们评估了库中所有循环调度技术的有效性并揭开了其性能的神秘面纱。我们表明,对于许多应用程序-系统对,LB4OMP 中的调度技术优于 OpenMP 中的调度选项。使用 LB4OMP 的节点级负载平衡可减少跨节点负载不平衡并提高 MPI+OpenMP 应用程序性能,这对于 Exascale 计算至关重要。我们评估了库中所有循环调度技术的有效性并揭开了其性能的神秘面纱。我们表明,对于许多应用程序-系统对,LB4OMP 中的调度技术优于 OpenMP 中的调度选项。使用 LB4OMP 的节点级负载平衡可减少跨节点负载不平衡并提高 MPI+OpenMP 应用程序性能,这对于 Exascale 计算至关重要。我们评估了库中所有循环调度技术的有效性并揭开了其性能的神秘面纱。我们表明,对于许多应用程序-系统对,LB4OMP 中的调度技术优于 OpenMP 中的调度选项。使用 LB4OMP 的节点级负载平衡可减少跨节点负载不平衡并提高 MPI+OpenMP 应用程序性能,这对于 Exascale 计算至关重要。
更新日期:2021-06-10
down
wechat
bug