当前位置: X-MOL 学术J. Sign. Process. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2021-02-13 , DOI: 10.1007/s11265-020-01633-z
Umar Ibrahim Minhas , Roger Woods , Georgios Karakonstantis

Whilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.



中文翻译:

FPGA上动态共享空间多任务处理的静态映射评估

尽管FPGA已用于云生态系统,但在运行时将异构多任务映射到共享资源上时,要实现高计算密度仍然是极具挑战性的。这项工作通过将FPGA资源视为服务并在高层采用多任务处理,设计空间探索和静态离线分区来解决,以允许将异构任务更有效地映射到FPGA上。此外,当系统设计参数发生变化时,将使用一个新的,全面的运行时功能模拟器来评估各种空间和时间约束对现有方法和新方法的影响。在Nallatech 385 FPGA卡上实现了一套全面的实际高性能计算任务,表明我们的方法平均可以提供2.9×和2。由于外部内存访问延迟和带宽限制,计算和混合强度任务的系统吞吐量提高3倍,而内存密集型任务的系统吞吐量降低0.2倍。通过引入一种新颖的调度方案来扩展工作,以在使用所提出的方法时提高资源的时间利用率。大队列混合强度任务(计算和内存)的其他结果表明,与以前的方案相比,所提出的分区和调度方法可以提供3倍以上的系统加速。

更新日期:2021-02-15
down
wechat
bug