当前位置: X-MOL 学术J. Syst. Archit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance Optimization for Parallel Systems with Shared DWM via Retiming, Loop Scheduling, and Data Placement
Journal of Systems Architecture ( IF 3.7 ) Pub Date : 2020-07-23 , DOI: 10.1016/j.sysarc.2020.101842
Siyuan Gao , Shouzhen Gu , Rui Xu , Edwin Hsing-Mean Sha , Qingfeng Zhuge

Domain Wall Memory (DWM) as an ideal candidate for replacing traditional memories especially in parallel systems, has many desirable characteristics such as low leakage power, high density and low access latency. However, due to the tape-like architecture of DWM, shift operations have vital impact on performance. Considering data-intensive applications with massive loops and arrays, increasing parallelism of loops, appropriate loop scheduling and data placement on DWM will significantly improve the performance of parallel systems. This paper explores optimizing performance of parallel systems through retiming, loop scheduling and data placement especially when the data are arrays. It proposes Integer Linear Programming (ILP) formulation and Scheduling While Placing (SWP) algorithm to generate optimal or nearly optimal loop scheduling and data placement with minimum execution time. The experimental results show that SWP and ILP can effectively reduce execution time when compared with greedy List Scheduling First Access First Place (LF) algorithm. Besides, this paper proposes Threshold Retiming Repetition (TRR) algorithm to combine retiming technique with SWP or ILP to improve performance. The experimental results show that SWP+TRR and ILP+TRR can further reduce the execution time when compared to results without retiming.



中文翻译:

具有重定时,循环调度和数据放置的共享DWM并行系统的性能优化

作为替代传统内存(尤其是在并行系统中)的理想选择,域墙内存(DWM)具有许多理想的特性,例如低泄漏功率,高密度和低访问延迟。但是,由于DWM具有类似磁带的体系结构,所以移位操作对性能有至关重要的影响。考虑到具有大量循环和数组的数据密集型应用程序,增加循环的并行性,在DWM上进行适当的循环调度和数据放置将显着提高并行系统的性能。本文探讨了通过重定时,循环调度和数据放置来优化并行系统的性能,尤其是当数据为数组时。它提出了整数线性规划(ILP)公式和放置时调度(SWP)算法,从而以最少的执行时间生成最佳或接近最佳的循环调度和数据放置。实验结果表明,与贪婪列表调度优先访问优先算法相比,SWP和ILP可以有效减少执行时间。此外,本文提出了阈值重定时重复(TRR)算法,将重定时技术与SWP或ILP相结合以提高性能。实验结果表明,与不重定时的结果相比,SWP + TRR和ILP + TRR可以进一步减少执行时间。此外,本文提出了阈值重定时重复(TRR)算法,将重定时技术与SWP或ILP相结合以提高性能。实验结果表明,与不重定时的结果相比,SWP + TRR和ILP + TRR可以进一步减少执行时间。此外,本文提出了阈值重定时重复(TRR)算法,将重定时技术与SWP或ILP相结合以提高性能。实验结果表明,与不重定时的结果相比,SWP + TRR和ILP + TRR可以进一步减少执行时间。

更新日期:2020-07-23
down
wechat
bug