PRO: A periodical reset optimized page migration scheme for hybrid memory system
Introduction
Industry and academic researchers have realized that modern applications exhibit increasing demand for large memory capacity to maintain the performance requirements in the embedded systems[1]. Main memory consisting entirely of DRAM has already hit the energy and scalability limits, which have motivated the search for alternatives to replace DRAM as main memory [2].Phase change memory (PCM) has been considered as a promising candidate for future main memory. PCM promises higher bit density and lower cost per bit than traditional DRAM. In addition, it demonstrates a comparable access speed to DRAM and is compatible to CMOS process technology[3]. Thus, researchers enable cost-effective hybrid main memory system with two partitions: DRAM and PCM, where PCM is slower but larger than DRAM. Combining PCM with a relatively small amount of DRAM can exploit the advantages of the high capacity of PCM and the low latencies of DRAM [3], [4], [5], [6].
Prior researches generally confirm that PCM can give benefits when utilized as an additional memory structure in hybrid memory hierarchy [6].Two hybrid memory topologies have been proposed in the literature as shown in Fig. 1. In flat memory, the processor can directly access DRAM and PCM. Data stored in PCM can be accessed via instructions and exchanged between DRAM and PCM straightforwardly [5].In hierarchical memory, processor should access PCM via DRAM.PCM is accessed only when DRAM miss occurs [1].
The hierarchical memory uses DRAM as an upper level cache of PCM main memory. The DRAM cache is hidden to the operating system which is similar to L1 or L2 cache.PCM is accessed only when the DRAM cache miss occurs. In the hierarchical memory architecture, the PCM main memory can still utilize an intact page replacement policy based on LRU algorithm or CLOCK algorithm [7], [8], [9], [10], [11], [12], [13], [14]. However, since the DRAM cache is transparent to the operating system, all actions related to the DRAM cache should be implemented by an extra hardware in the hierarchical memory architecture, thus deploying fully associate placement in hierarchical memory architecture is difficult. Moreover, the data consistency technique should be considered between DRAM cache and PCM main memory.
The flat memory organizes DRAM and PCM on the same layer and manages them together under a single physical address space with no additional hardware cost. In addition, in the flat memory, fully associate placement is possible and the DRAM and PCM space could be used more effectively. Because of the advantages mentioned above, this work focuses on the flat memory.
However, like flash memory, the physical properties of PCM dictate it suffering problems on write operations. Table 1 shows the physical properties of DRAM and PCM [8]. Generally, the latency and energy consumption of write operations in PCM are larger than that of DRAM [4]. Furthermore, PCM is suffering from the endurance problem that each cell can only be written approximately, which has a higher probability to be worn out in the limited write times. These limitations may significantly impact system performance, such as shorting the lifetime of memory system, restricting the usefulness of PCM in commercial system. Therefore, the write traffic to these PCM devices must be reduced.
To mitigate the endurance problem in hybrid memory system, discussions regarding on hybrid memory page replacement mechanism, which aim to avoid excessive write operations in PCM, have dominated research in recent years [7], [8], [9], [10], [11], [12], [13], [14].
In this paper, we restudy the data access features in the applications and conduct further research on inter-reference distance to exploit the locality regulations of workloads. According to the research, we find that employing inter-reference distance in the design of memory controller can be a better way to pick the hot write pages out. Only employing a global request counter to represent inter-reference distance information to dynamically predict the locality of actively access pages for the whole blocks instead of setting the write counter for each unit of the memory is the highlight of this paper. As shown in the simulation results in this paper, the PCM write count of PRO is significantly decreased than the other algorithms for most cases. For our design, the inter-reference distance information can be represented by the number of accesses that hybrid memory receives approximately, which can be implemented by a global request counter. Such simple design greatly reduces the complexity of memory controller. Then, an efficient page migration scheme for hybrid memory, Periodical Reset Optimized Page Migration Scheme (PRO), is proposed, which could pick the hot write pages out and reduce the number of writes in PCM with limited swap operations. The main contributions of this paper are summarized as follows:
1) We find that employing inter-reference distance in the design of memory controller can be a better and simple way to pick the hot write pages out.
2) We propose an efficient page replacement mechanism for hybrid memory system, which could reduce the number of writes in PCM with limited swap operations.
PRO reduces the write operations by an average of 88.75% compared with the state of art. At the same time, it improves the efficiency of each migration and reduces average memory access time effectively.
The remainder of the paper is organized as follows. Section 2 introduces the related work and research motivation briefly. Section 3 presents the analysis of inter-reference distance in workloads. Section 4 describes the details of PRO. Section 5 provides the experimental setup and results with discussion on performance. Finally, we conclude this paper in Section 6.
Section snippets
Related work and motivation
Recently, the academic community has extensively explored page replacement policies for hybrid memory which could efficiently enhance the write performance and endurance ability of PCM in hybrid memory system. Most of these policies are based on the CLOCK algorithm and LRU algorithm.
RaPP[9] is the most remarkable algorithm for hybrid memory system. RaPP takes the recency and frequency of pages into account, it ranks pages according to access frequency. However, combining read and write
Analysis of inter-reference distance
In this section, we analyze the properties of the memory references and propose a simple method to pick the hot write pages out.
For our study, we captured the virtual memory access traces through a modified gem5 simulator[16]. We filtered out the memory references that directly captured from the caches and gathered only the memory references observed at the level of main memory. The memory access traces were obtained (9 typical benchmarks) from MiBench[17] and MediaBench [18], which are widely
Pro migration mechanism
In this section, we propose a conceptually new migration scheme called PRO (Periodical Reset Optimized Page Migration Scheme), which could minimize the write operations in PCM with limited swap operations. PRO keeps a global request counter and gathers statistics about the numbers of accesses memory controller receives. Then, the global request counter controls to clear the dirty bits of the pages in swap candidate group to 0 every D memory accesses periodically. With the conclusion obtained
Experimental setup
In this section, we evaluate the performance of the proposed scheme with gem5-nvmain simulator which is a hybrid memory system simulator that could accurately characterize the performance of PCM and DRAM combination architecture[31]. Syscall Emulation (SE) mode with out of order CPU is adopted in gem5-nvmain simulator, which could guarantee the system running environment. In this experimental environment, detailed simulation configurations for main memory are listed in Table 2. The benchmarks
Conclusion
In this work, we delve into exploiting the data access features in the applications and conduct further research on inter-reference distance. It is found that most pages will be revisited in a certain inter-reference distance range. Using inter-reference distance ranging in a certain range could pick the hot accessed pages out. Then, basing on this observation, we introduce an efficient page replacement scheme called PRO (Periodical Reset Optimized Page Migration Scheme) for hybrid memory
Declaration of Competing Interest
We would like to submit the manuscript entitled “PRO: A Periodical Reset Optimized Page Migration Scheme For Hybrid Memory System”, which we wish to be considered for publication in “Journal of Systems Architecture”. No conflict of interest exists in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under
Acknowledgment
This work was supported by a grant from the National Natural Science Foundation of China (NSFC, no. 61504032).
Na Niu received the M.S. degree in University of Electronic Science and Technology of China, Chengdu, China, in 2016. She is currently working toward her PhD in Microelectronics Center, Harbin Institute of Technology. Her research is about hybrid memory design, very large-scale integration design and system on chips (SoC).
References (31)
- et al.
Refinery swap: an efficient swap mechanism for hybrid dram–nvm systems
Future Gen. Comput. Syst.
(2017) - et al.
Adaptive-classification clock: page replacement policy based on read/write access pattern for hybrid dram and pcm main memory
Microprocess. Microsyst.
(2018) - et al.
Greedydual*web caching algorithm: exploiting the two sources of temporal locality in web request streams
Comput. Commun.
(2001) - et al.
Scalable high performance main memory system using phase-change memory technology
ACM SIGARCH Computer Architecture News
(2009) - et al.
Architecting phase change memory as a scalable dram alternative
ACM SIGARCH Comput. Archit. News
(2009) - et al.
A durable and energy efficient main memory using phase change memory technology
ACM SIGARCH Computer Architecture News
(2009) - et al.
Phase change memory technology
J. Vacuum Sci. Technol. BNanotechnol. Microelectron.
(2010) - et al.
Pdram: A hybrid pram and dram main memory system
2009 46th ACM/IEEE Design Automation Conference
(2009) - W.H. Radke, M. Murray, M.R. Furuhjelm, J. Geldman, Hybrid memory management, 2011. US Patent...
- et al.
Page replacement algorithms for nand flash memory storages
International Conference on Computational Science and Its Applications
(2007)
Efficient warranty-aware wear leveling for embedded systems with pcm main memory
IEEE Trans. Very Large Scale Integr. VLSI Syst.
Page placement in hybrid memory systems
Proceedings of the international conference on Supercomputing
Clock-dwf: a write-history-aware page replacement algorithm for hybrid pcm and dram memory architectures
IEEE Trans. Comput.
M-clock: migration-optimized page replacement algorithm for hybrid dram and pcm memory architecture
Proceedings of the 30th Annual ACM Symposium on Applied Computing
Wird: an efficiency migration scheme in hybrid dram and pcm main memory for image processing applications
IEEE Access
Cited by (6)
A machine learning assisted data placement mechanism for hybrid storage systems
2021, Journal of Systems ArchitectureCitation Excerpt :The main idea of those methods is to place hot data into the fast storage medium and place cold data into the slow storage medium. Compared with data placement for hybrid memories [34], there are only several studies that focus on data placement based on file popularity. and is widely used in hybrid memory systems.
Analysis of power-performance trade-offs in DRAM-NVM based hybrid main memory
2023, AIP Conference ProceedingsChallenges in Design, Data Placement, Migration and Power-Performance Trade-offs in DRAM-NVM-based Hybrid Memory Systems
2023, IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)PFHA: A Novel Page Migration Algorithm for Hybrid Memory Embedded Systems
2021, IEEE Transactions on Very Large Scale Integration (VLSI) SystemsSWA: A Novel Page Migration Scheme for Hybrid Memory System
2021, 2021 IEEE 4th International Conference on Computer and Communication Engineering Technology, CCET 2021Ta-clock: Tendency-aware page replacement policy for hybrid main memory in high-performance embedded systems
2021, Electronics (Switzerland)
Na Niu received the M.S. degree in University of Electronic Science and Technology of China, Chengdu, China, in 2016. She is currently working toward her PhD in Microelectronics Center, Harbin Institute of Technology. Her research is about hybrid memory design, very large-scale integration design and system on chips (SoC).
Fangfa Fu received the M.S. and Ph.D. degrees in microelectronics and solid-state electronics from Harbin Institute of Technology, Harbin, China, in 2007 and 2012, respectively. Since 2012, he has been giving lectures with the Microelectronics Center, Harbin Institute of Technology. His research interests include the system on chips (SoC), networks on chips (NoC), very large-scale integration design and digital signal processing.
Bing Yang received the M.S. and Ph.D. degrees in microelectronics and solid-state electronics from Harbin Institute of Technology, Harbin, China, in 2002 and 2009, respectively. His research interests include the system on chips (SoC), very large-scale integration design and computer architecture.
Jiacai Yuan received the B.S. degree in Harbin Institute of Technology, Harbin, China, in 2018; He is currently working toward his M.S degree in Microelectronics Center, Harbin Institute of Technology. His research is about hybrid memory design, very large-scale integration design and system on chips (SoC).
Fengchang Lai received the B.S.degree in Harbin Institute of Technology, Harbin, China, in 1984 ;He is currently a Professor in the Microelectronics Center, Harbin Institute of Technology. His research interests are very large-scale integration design, SoC.
Chengxin Zhao Chengxin Zhao received his M.S. degree in Electronic Engineering from Royal Institute of Technology, Sweden and his PhD degree in Electronic Engineering from University of Oslo, Norway, He is now a Professor at Institute of Modern Physics, Chinese Academy of Science. His research is about ASIC and electronics for nuclear experiments and medical applications.
Jinxiang Wang received the B.S. and M.S. degrees in semiconductor physics and the Ph.D. degree in communication and information engineering from Harbin Institute of Technology, Harbin, China, in 1990, 1993, and 1999, respectively. He is currently a Professor with the Microelectronics Center, Harbin Institute of Technology. His research interests are very large scale integration design,wireless communication, SoC and NoC.