Towards Higher Performance and Robust Compilation for CGRA Modulo Scheduling,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Higher Performance and Robust Compilation for CGRA Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-04-21 , DOI: 10.1109/tpds.2020.2989149
Zhongyuan Zhao , Weiguang Sheng , Qin Wang , Wenzhi Yin , Pengfei Ye , Jinchao Li , Zhigang Mao

Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution for accelerating computation intensive tasks due to its good trade-off in energy efficiency and flexibility. One of the challenging research topic is how to effectively deploy loops onto CGRAs within acceptable compilation time. Modulo scheduling (MS) has shown to be efficient on deploying loops onto CGRAs. Existing CGRA MS algorithms still suffer from the challenge of mapping loop with higher performance under acceptable compilation time, especially mapping large and irregular loops onto CGRAs with limited computational and routing resources. This is mainly due to the under utilization of the available buffer resources on CGRA, unawareness of critical mapping constraints and time consuming method of solving temporal and spatial mapping. This article focus on improving the performance and compilation robustness of the modulo scheduling mapping algorithm for CGRAs. We decomposes the CGRA MS problem into the temporal and spatial mapping problem and reorganize the processes inside these two problems. For the temporal mapping problem, we provide a comprehensive and systematic mapping flow that includes a powerful buffer allocation algorithm, and efficient interconnection & computational constraints solving algorithms. For the spatial mapping problem, we develop a fast and stable spatial mapping algorithm with backtracking and reordering mechanism. Our MS mapping algorithm is able to map loops onto CGRA with higher performance and faster compilation time. Experiment results show that given the same compilation time budget, our mapping algorithm generates higher compilation success rate. Among the successfully compiled loops, our approach can improve 5.4 to 14.2 percent performance and takes x24 to x1099 less compilation time in average comparing with state-of-the-art CGRA mapping algorithms.

中文翻译：

实现 CGRA 模调度的更高性能和鲁棒编译

粗粒度可重构架构（CGRA）由于其在能源效率和灵活性方面的良好权衡，是加速计算密集型任务的一种有前途的解决方案。具有挑战性的研究主题之一是如何在可接受的编译时间内有效地将循环部署到 CGRA 上。模调度 (MS) 已被证明能够有效地将循环部署到 CGRA 上。现有的 CGRA MS 算法仍然面临着在可接受的编译时间下以更高性能映射循环的挑战，特别是在计算和路由资源有限的情况下将大型且不规则的循环映射到 CGRA 上。这主要是由于 CGRA 上的可用缓冲区资源利用不足、不了解关键映射约束以及求解时空映射的方法耗时。本文重点关注提高 CGRA 的模调度映射算法的性能和编译鲁棒性。我们将 CGRA MS 问题分解为时间和空间映射问题，并重新组织这两个问题内部的过程。针对时间映射问题，我们提供了全面、系统的映射流程，包括强大的缓冲区分配算法、高效的互连和计算约束求解算法。针对空间映射问题，我们开发了一种快速稳定的具有回溯和重新排序机制的空间映射算法。我们的 MS 映射算法能够将循环映射到 CGRA，并具有更高的性能和更快的编译时间。实验结果表明，在相同的编译时间预算下，我们的映射算法产生更高的编译成功率。在成功编译的循环中，我们的方法可以将 5.4 提高到 14。与最先进的 CGRA 映射算法相比，性能提高 2%，平均编译时间减少 24 至 1099 倍。

更新日期：2020-04-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11