当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coarse-Grained Parallel Routing with Recursive Partitioning for FPGAs
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-04-01 , DOI: 10.1109/tpds.2020.3035787
Minghua Shen , Guojie Luo , Nong Xiao

Routing is a very time-consuming stage in the FPGA design flow, significantly hindering the productivity. This article proposes CPRS, a coarse-grained parallel routing scheme in a distributed computing environment. First, we partition entire routing region to guide the assignment of nets for parallel processing. The partitioning is a recursive fashion, and at each recursive partitioning, the region is partitioned into two subregions forming three subsets of nets. The first subset consists of potentially dependent nets and they are distributed in different subregions. The remaining two subsets consist of potentially independent nets and they are distributed in their own subregions. Second, we route the nets of first subset in serial and process the remaining two subsets in parallel. The parallel processing is a coarse-grained fashion, which is implemented by MPI parallel programming model. Finally, we explore the optimization of both partitioning and parallel processing to further improve the overall speedup of parallel routing. In addition, we adopt MPI message to synchronize the intermediate results between different cores in parallel routing for a feasible solution. Experiments use a set of commonly used benchmarks to demonstrate the effectiveness of CPRS. Notably, CPRS achieves about 18× speedup on average using 32 processor cores with minor loss of quality, compared with the VTR 7.0 serial router. There is about 1.6× improvement over the state-of-the-art parallel router.

中文翻译:

用于 FPGA 的具有递归分区的粗粒度并行路由

布线是 FPGA 设计流程中非常耗时的阶段,严重阻碍了生产力。本文提出了CPRS,一种分布式计算环境下的粗粒度并行路由方案。首先,我们划分整个布线区域以指导并行处理的网络分配。分区是一种递归方式,在每次递归分区时,该区域被划分为两个子区域,形成三个网络子集。第一个子集由潜在的依赖网络组成,它们分布在不同的子区域。剩下的两个子集由潜在独立的网络组成,它们分布在自己的子区域中。其次,我们串行路由第一个子集的网络并并行处理剩余的两个子集。并行处理是一种粗粒度的方式,由 MPI 并行编程模型实现。最后,我们探索了分区和并行处理的优化,以进一步提高并行路由的整体加速。此外,我们采用 MPI 消息来同步并行路由中不同内核之间的中间结果,以获得可行的解决方案。实验使用一组常用的基准来证明 CPRS 的有效性。值得注意的是,与 VTR 7.0 串行路由器相比,CPRS 使用 32 个处理器内核平均实现了约 18 倍的加速,质量损失较小。与最先进的并行路由器相比,大约有 1.6 倍的改进。我们采用 MPI 消息来同步并行路由中不同核之间的中间结果,以获得可行的解决方案。实验使用一组常用的基准来证明 CPRS 的有效性。值得注意的是,与 VTR 7.0 串行路由器相比,CPRS 使用 32 个处理器内核平均实现了约 18 倍的加速,质量损失较小。与最先进的并行路由器相比,大约有 1.6 倍的改进。我们采用 MPI 消息来同步并行路由中不同核之间的中间结果,以获得可行的解决方案。实验使用一组常用的基准来证明 CPRS 的有效性。值得注意的是,与 VTR 7.0 串行路由器相比,CPRS 使用 32 个处理器内核平均实现了约 18 倍的加速,质量损失较小。与最先进的并行路由器相比,大约有 1.6 倍的改进。
更新日期:2021-04-01
down
wechat
bug