Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2019-08-08 , DOI: 10.1007/s10766-019-00640-3
Re’em Harel , Idan Mosseri , Harel Levin , Lee-or Alon , Matan Rusanovsky , Gal Oren

Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures, which have become widespread in recent years, especially for scientific applications. In shared memory architectures, the most common parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP parallelization to applications is not always a simple task, due to common parallel shared memory management pitfalls and architecture heterogeneity. To ease this process, many automatic parallelization compilers were created. In this paper we focus on three source-to-source compilers—AutoPar, Par4All and Cetus—which were found to be most suitable for the task, point out their strengths and weaknesses, analyze their performances, inspect their capabilities and suggest new paths for enhancement. We analyze and compare the compilers’ performances over several different exemplary test cases, with each test case pointing out different pitfalls, and suggest several new ways to overcome these pitfalls, while yielding excellent results in practice. Moreover, we note that all of those source-to-source parallelization compilers function in the limits of OpenMP 2.5—an outdated version of the API which is no longer in optimal accordance with nowadays complicated heterogeneous architectures. Therefore we suggest a path to exploit the new features of OpenMP 4.5, as it provides new directives to fully utilize heterogeneous architectures, specifically ones that have a strong collaboration between CPUs and GPGPUs, thus it outperforms previous results by an order of magnitude.

中文翻译：

用于科学共享内存多核和加速多处理的源到源并行化编译器：分析、缺陷、增强和潜力

并行化方案对于利用多核架构的全部优势至关重要，近年来，多核架构已变得普遍，尤其是在科学应用中。在共享内存架构中，最常见的并行化 API 是 OpenMP。然而，由于常见的并行共享内存管理缺陷和架构异构性，将正确和最佳的 OpenMP 并行化引入应用程序并不总是一项简单的任务。为了简化这个过程，创建了许多自动并行化编译器。在本文中，我们关注三个源到源编译器——AutoPar、Par4All 和 Cetus——它们被认为最适合该任务，指出它们的优点和缺点，分析它们的性能，检查它们的能力并建议新的路径增强。我们在几个不同的示例性测试用例中分析和比较了编译器的性能，每个测试用例指出了不同的缺陷，并提出了几种克服这些缺陷的新方法，同时在实践中产生了出色的结果。此外，我们注意到所有这些源到源并行化编译器都在 OpenMP 2.5 的限制下运行 - 一个过时的 API 版本，不再符合当今复杂的异构架构。因此，我们提出了一条利用 OpenMP 4.5 新功能的途径，因为它提供了充分利用异构架构的新指令，特别是那些在 CPU 和 GPGPU 之间具有强大协作的架构，因此它的性能比以前的结果高出一个数量级。每个测试用例都指出了不同的陷阱，并提出了几种克服这些陷阱的新方法，同时在实践中产生了出色的结果。此外，我们注意到所有这些源到源并行化编译器都在 OpenMP 2.5 的限制下运行 - 一个过时的 API 版本，不再符合当今复杂的异构架构。因此，我们提出了一条利用 OpenMP 4.5 新功能的途径，因为它提供了充分利用异构架构的新指令，特别是那些在 CPU 和 GPGPU 之间具有强大协作的架构，因此它的性能比以前的结果高出一个数量级。每个测试用例都指出了不同的陷阱，并提出了几种克服这些陷阱的新方法，同时在实践中产生了出色的结果。此外，我们注意到所有这些源到源并行化编译器都在 OpenMP 2.5 的限制下运行 - 一个过时的 API 版本，不再符合当今复杂的异构架构。因此，我们提出了一条利用 OpenMP 4.5 新功能的途径，因为它提供了充分利用异构架构的新指令，特别是那些在 CPU 和 GPGPU 之间具有强大协作的架构，因此它的性能比以前的结果高出一个数量级。我们注意到，所有这些源到源并行化编译器都在 OpenMP 2.5 的限制下运行 - 一个过时的 API 版本，不再符合当今复杂的异构架构。因此，我们提出了一条利用 OpenMP 4.5 新功能的途径，因为它提供了充分利用异构架构的新指令，特别是那些在 CPU 和 GPGPU 之间具有强大协作的架构，因此它的性能比以前的结果高出一个数量级。我们注意到，所有这些源到源并行化编译器都在 OpenMP 2.5 的限制下运行 - 一个过时的 API 版本，不再符合当今复杂的异构架构。因此，我们提出了一条利用 OpenMP 4.5 新功能的途径，因为它提供了充分利用异构架构的新指令，特别是那些在 CPU 和 GPGPU 之间具有强大协作的架构，因此它的性能比以前的结果高出一个数量级。

更新日期：2019-08-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11