当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compiler-Assisted Data Streaming for Regular Code Structures
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2021-03-01 , DOI: 10.1109/tc.2020.2990302
Nuno Neves 1 , Pedro Tomas 1 , Nuno Roma 1
Affiliation  

The performance of modern processors is often limited by execution stalls resulting from long memory access latencies. Compile-time optimizations, deep cache hierarchies and prefetching mechanisms already provide significant performance gains, by performing memory accesses in parallel with computation. However, they are reaching a throughput improvement limit. Hence, new solutions that effectively exploit the memory access patterns to improve processing throughput are required. To achieve this objective, a new compiler-assisted data streaming method is proposed. It leverages static analysis and code transformations with an on-chip data streaming support as a viable alternative to prefetching mechanisms for regular code structures. Static analysis is used to identify and encode memory accesses with a dedicated representation. Then, a code transformation algorithm detaches data indexation and address calculation from computation, allowing for a significant code reduction. An on-chip data stream controller, attached to the L1 data cache, is used to autonomously generate memory accesses from the pattern representation and reorganize the data transfers in streams, with the aid of stream buffers. When compared with state-of-the-art prefetchers, the proposed solution provides up to 26% of code reduction, an IPC improvement of 2.4x, and an average performance improvement of 40%.

中文翻译:

常规代码结构的编译器辅助数据流

现代处理器的性能通常受到长内存访问延迟导致的执行停顿的限制。通过与计算并行执行内存访问,编译时优化、深度缓存层次结构和预取机制已经提供了显着的性能提升。但是,它们正在达到吞吐量改进的极限。因此,需要有效利用内存访问模式来提高处理吞吐量的新解决方案。为了实现这一目标,提出了一种新的编译器辅助数据流方法。它利用具有片上数据流支持的静态分析和代码转换作为常规代码结构预取机制的可行替代方案。静态分析用于识别和编码具有专用表示的内存访问。然后,代码转换算法将数据索引和地址计算与计算分离,从而显着减少代码。连接到 L1 数据缓存的片上数据流控制器用于根据模式表示自主生成内存访问,并在流缓冲区的帮助下重新组织流中的数据传输。与最先进的预取器相比,所提出的解决方案最多可减少 26% 的代码,IPC 提高 2.4 倍,平均性能提高 40%。用于从模式表示自主生成内存访问,并在流缓冲区的帮助下重新组织流中的数据传输。与最先进的预取器相比,所提出的解决方案最多可减少 26% 的代码,IPC 提高 2.4 倍,平均性能提高 40%。用于从模式表示自主生成内存访问,并在流缓冲区的帮助下重新组织流中的数据传输。与最先进的预取器相比,所提出的解决方案最多可减少 26% 的代码,IPC 提高 2.4 倍,平均性能提高 40%。
更新日期:2021-03-01
down
wechat
bug