当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An efficient dataflow accelerator for scientific applications
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-03-10 , DOI: 10.1016/j.future.2020.03.023
Xiaochun Ye , Xu Tan , Meng Wu , Yujing Feng , Da Wang , Hao Zhang , Songwen Pei , Dongrui Fan

Dataflow architecture has been proved to be promising in high-performance computing. Traditional dataflow architectures are not efficient enough in typical scientific applications such as stencil and FFT due to low utilization of function units. Based on the blocking and parallelism features of scientific applications, we design SPU, an efficient dataflow architecture for scientific applications. In SPU, dataflow graphs translated from the loop body in scientific applications are mapped to the Processing Element(PE) Array. Iterations enter the dataflow graph in pipeline during execution meanwhile three levels of parallelism are exploited to improve the utilization of function units in dataflow architectures: inner-graph parallelism, pipelining parallelism and inter graph parallelism. The experimental results show that the average energy efficiency of SPU achieves 25.97GFlops/W in 40 nm technology and the utilization of floating point function units in SPU is 2.82x that of typical dataflow architecture on average for typical scientific applications.



中文翻译:

用于科学应用的高效数据流加速器

事实证明,数据流体系结构在高性能计算中很有前途。由于功能单元利用率低,传统的数据流体系结构在诸如模板和FFT等典型的科学应用中效率不高。基于科学应用程序的阻塞和并行性特征,我们设计了SPU,一种适用于科学应用程序的高效数据流体系结构。在SPU中,科学应用中从循环体转换而来的数据流图被映射到处理元素(PE)数组。迭代在执行过程中进入流水线中的数据流图,同时利用三个并行度来提高数据流体系结构中功能单元的利用率:内部图并行度,流水线并行度和图间并行度。

更新日期:2020-03-10
down
wechat
bug