当前位置: X-MOL 学术PLOS ONE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures.
PLOS ONE ( IF 3.7 ) Pub Date : 2021-09-14 , DOI: 10.1371/journal.pone.0257047
Adrián Lamela 1 , Óscar G Ossorio 2 , Guillermo Vinuesa 2 , Benjamín Sahelices 1
Affiliation  

Non-volatile memory technology is now available in commodity hardware. This technology can be used as a backup memory for an external dram cache memory without needing to modify the software. However, the higher read and write latencies of non-volatile memory may exacerbate the memory wall problem. In this work we present a novel off-chip prefetch technique based on a Hidden Markov Model that specifically deals with the latency problem caused by complexity of off-chip memory access patterns. Firstly, we present a thorough analysis of off-chip memory access patterns to identify its complexity in multicore processors. Based on this study, we propose a prefetching module located in the llc which uses two small tables, and where the computational complexity of which is linear with the number of computing threads. Our Markov-based technique is able to keep track and make clustering of several simultaneous groups of memory accesses coming from multiple simultaneous threads in a multicore processor. It can quickly identify complex address groups and trigger prefetch with very high accuracy. Our simulations show an improvement of up to 76% in the hit ratio of an off-chip dram cache for multicore architecture over the conventional prefetch technique (g/dc). Also, the overhead of prefetch requests (failed prefetches) is reduced by 48% in single core simulations and by 83% in multicore simulations.

中文翻译:

基于隐马尔可夫模型的非易失性存储器架构的片外预取。

非易失性存储器技术现在可用于商用硬件。该技术可用作外部DRAM缓存的备份存储器,无需修改软件。然而,非易失性存储器较高的读写延迟可能会加剧内存墙问题。在这项工作中,我们提出了一种基于隐马尔可夫模型的新型片外预取技术,该技术专门处理由片外存储器访问模式的复杂性引起的延迟问题。首先,我们对片外存储器访问模式进行了全面分析,以确定其在多核处理器中的复杂性。基于这项研究,我们提出了一个位于 llc 中的预取模块,它使用两个小表,其计算复杂度与计算线程数成线性关系。我们基于马尔可夫的技术能够跟踪和聚类来自多核处理器中多个并发线程的多个并发内存访问组。它可以快速识别复杂的地址组并以非常高的精度触发预取。我们的模拟表明,与传统的预取技术 (g/dc) 相比,多核架构的片外 DRAM 缓存的命中率提高了 76%。此外,预取请求(失败预取)的开销在单核模拟中减少了 48%,在多核模拟中减少了 83%。我们的模拟表明,与传统的预取技术 (g/dc) 相比,多核架构的片外 DRAM 缓存的命中率提高了 76%。此外,预取请求(失败预取)的开销在单核模拟中减少了 48%,在多核模拟中减少了 83%。我们的模拟表明,与传统的预取技术 (g/dc) 相比,多核架构的片外 DRAM 缓存的命中率提高了 76%。此外,预取请求(失败预取)的开销在单核模拟中减少了 48%,在多核模拟中减少了 83%。
更新日期:2021-09-14
down
wechat
bug