AutoParallel: Automatic parallelisation and distributed execution of affine loop nests in Python,The International Journal of High Performance Computing Applications

当前位置： X-MOL 学术 › Int. J. High Perform. Comput. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AutoParallel: Automatic parallelisation and distributed execution of affine loop nests in Python
The International Journal of High Performance Computing Applications ( IF 3.5 ) Pub Date : 2020-07-14 , DOI: 10.1177/1094342020937050
Cristian Ramon-Cortes ₁ , Ramon Amela ₁ , Jorge Ejarque ₁ , Philippe Clauss ₂ , Rosa M. Badia ₁

Affiliation

The last improvements in programming languages and models have focused on simplicity and abstraction; leading Python to the top of the list of the programming languages. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and contains one single annotation (in the form of a Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores. The evaluation demonstrates that AutoParallel goes one step further in easing the development of distributed applications. On the one hand, the programmability evaluation highlights the benefits of using a single Python decorator instead of manually annotating each task and its parameters or, even worse, having to develop the parallel code explicitly (e.g., using OpenMP, MPI). On the other hand, the performance evaluation demonstrates that AutoParallel is capable of automatically generating task-based workflows from sequential Python code while achieving the same performances than manually taskified versions of established state-of-the-art algorithms (i.e., Cholesky, LU, and QR decompositions). Finally, AutoParallel is also capable of automatically building data blocks to increase the tasks’ granularity; freeing the user from creating the data chunks, and re-designing the algorithm. For advanced users, we believe that this feature can be useful as a baseline to design blocked algorithms.

中文翻译：

AutoParallel：Python 中仿射循环嵌套的自动并行化和分布式执行

编程语言和模型的最新改进集中在简单性和抽象性上。将 Python 带到编程语言列表的首位。但是，在阻止用户直接处理分布式和并行计算问题时，仍有改进的余地。本文提出并评估了 AutoParallel，这是一个 Python 模块，可自动找到合适的基于任务的仿射循环嵌套并行化，并在分布式计算基础设施中并行执行它们。它基于顺序编程并包含一个注释（以 Python 装饰器的形式），因此任何具有中级编程技能的人都可以将应用程序扩展到数百个内核。评估表明 AutoParallel 在简化分布式应用程序的开发方面更进了一步。一方面，可编程性评估突出了使用单个 Python 装饰器而不是手动注释每个任务及其参数的好处，或者更糟糕的是，必须显式开发并行代码（例如，使用 OpenMP、MPI）。另一方面，性能评估表明 AutoParallel 能够从顺序 Python 代码自动生成基于任务的工作流，同时实现与已建立的最先进算法的手动任务化版本（即 Cholesky、LU、和 QR 分解）。最后，AutoParallel 还能够自动构建数据块以增加任务的粒度；将用户从创建数据块中解放出来，并重新设计算法。对于高级用户，我们认为此功能可用作设计阻塞算法的基线。

更新日期：2020-07-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文