PRO: A periodical reset optimized page migration scheme for hybrid memory system

https://doi.org/10.1016/j.sysarc.2020.101786Get rights and content

Abstract

Due to its attractive characteristics, phase change memory (PCM) has emerged as a promising candidate to be used in main memory of embedded systems in the future. However, the write endurance problem has restricted its application in practical. Discussions regarding on hybrid memory page replacement mechanism have dominated research in recent years. Most existing schemes lack the detection for the locality regulations of memory access and do not take migration efficiency into consideration. In this paper, we rethink data access features and conduct further research on inter-reference distance to exploit the locality regulations of workloads in embedded systems. Then, propose a novel page migration scheme, Periodical Reset Optimized Page Migration Scheme (PRO), which is a hardware-software coordination algorithm. PRO largely reduces the number of write operations in PCM with limited swap operations. Meanwhile, it reduces the write operations by an average of 88.75% and decreases the average access time by an average 11.47% compared with the typical migration schemes.

Introduction

Industry and academic researchers have realized that modern applications exhibit increasing demand for large memory capacity to maintain the performance requirements in the embedded systems[1]. Main memory consisting entirely of DRAM has already hit the energy and scalability limits, which have motivated the search for alternatives to replace DRAM as main memory [2].Phase change memory (PCM) has been considered as a promising candidate for future main memory. PCM promises higher bit density and lower cost per bit than traditional DRAM. In addition, it demonstrates a comparable access speed to DRAM and is compatible to CMOS process technology[3]. Thus, researchers enable cost-effective hybrid main memory system with two partitions: DRAM and PCM, where PCM is slower but larger than DRAM. Combining PCM with a relatively small amount of DRAM can exploit the advantages of the high capacity of PCM and the low latencies of DRAM [3], [4], [5], [6].

Prior researches generally confirm that PCM can give benefits when utilized as an additional memory structure in hybrid memory hierarchy [6].Two hybrid memory topologies have been proposed in the literature as shown in Fig. 1. In flat memory, the processor can directly access DRAM and PCM. Data stored in PCM can be accessed via instructions and exchanged between DRAM and PCM straightforwardly [5].In hierarchical memory, processor should access PCM via DRAM.PCM is accessed only when DRAM miss occurs [1].

The hierarchical memory uses DRAM as an upper level cache of PCM main memory. The DRAM cache is hidden to the operating system which is similar to L1 or L2 cache.PCM is accessed only when the DRAM cache miss occurs. In the hierarchical memory architecture, the PCM main memory can still utilize an intact page replacement policy based on LRU algorithm or CLOCK algorithm [7], [8], [9], [10], [11], [12], [13], [14]. However, since the DRAM cache is transparent to the operating system, all actions related to the DRAM cache should be implemented by an extra hardware in the hierarchical memory architecture, thus deploying fully associate placement in hierarchical memory architecture is difficult. Moreover, the data consistency technique should be considered between DRAM cache and PCM main memory.

The flat memory organizes DRAM and PCM on the same layer and manages them together under a single physical address space with no additional hardware cost. In addition, in the flat memory, fully associate placement is possible and the DRAM and PCM space could be used more effectively. Because of the advantages mentioned above, this work focuses on the flat memory.

However, like flash memory, the physical properties of PCM dictate it suffering problems on write operations. Table 1 shows the physical properties of DRAM and PCM [8]. Generally, the latency and energy consumption of write operations in PCM are larger than that of DRAM [4]. Furthermore, PCM is suffering from the endurance problem that each cell can only be written 106108 approximately, which has a higher probability to be worn out in the limited write times. These limitations may significantly impact system performance, such as shorting the lifetime of memory system, restricting the usefulness of PCM in commercial system. Therefore, the write traffic to these PCM devices must be reduced.

To mitigate the endurance problem in hybrid memory system, discussions regarding on hybrid memory page replacement mechanism, which aim to avoid excessive write operations in PCM, have dominated research in recent years [7], [8], [9], [10], [11], [12], [13], [14].

In this paper, we restudy the data access features in the applications and conduct further research on inter-reference distance to exploit the locality regulations of workloads. According to the research, we find that employing inter-reference distance in the design of memory controller can be a better way to pick the hot write pages out. Only employing a global request counter to represent inter-reference distance information to dynamically predict the locality of actively access pages for the whole blocks instead of setting the write counter for each unit of the memory is the highlight of this paper. As shown in the simulation results in this paper, the PCM write count of PRO is significantly decreased than the other algorithms for most cases. For our design, the inter-reference distance information can be represented by the number of accesses that hybrid memory receives approximately, which can be implemented by a global request counter. Such simple design greatly reduces the complexity of memory controller. Then, an efficient page migration scheme for hybrid memory, Periodical Reset Optimized Page Migration Scheme (PRO), is proposed, which could pick the hot write pages out and reduce the number of writes in PCM with limited swap operations. The main contributions of this paper are summarized as follows:

1) We find that employing inter-reference distance in the design of memory controller can be a better and simple way to pick the hot write pages out.

2) We propose an efficient page replacement mechanism for hybrid memory system, which could reduce the number of writes in PCM with limited swap operations.

PRO reduces the write operations by an average of 88.75% compared with the state of art. At the same time, it improves the efficiency of each migration and reduces average memory access time effectively.

The remainder of the paper is organized as follows. Section 2 introduces the related work and research motivation briefly. Section 3 presents the analysis of inter-reference distance in workloads. Section 4 describes the details of PRO. Section 5 provides the experimental setup and results with discussion on performance. Finally, we conclude this paper in Section 6.

Section snippets

Related work and motivation

Recently, the academic community has extensively explored page replacement policies for hybrid memory which could efficiently enhance the write performance and endurance ability of PCM in hybrid memory system. Most of these policies are based on the CLOCK algorithm and LRU algorithm.

RaPP[9] is the most remarkable algorithm for hybrid memory system. RaPP takes the recency and frequency of pages into account, it ranks pages according to access frequency. However, combining read and write

Analysis of inter-reference distance

In this section, we analyze the properties of the memory references and propose a simple method to pick the hot write pages out.

For our study, we captured the virtual memory access traces through a modified gem5 simulator[16]. We filtered out the memory references that directly captured from the caches and gathered only the memory references observed at the level of main memory. The memory access traces were obtained (9 typical benchmarks) from MiBench[17] and MediaBench [18], which are widely

Pro migration mechanism

In this section, we propose a conceptually new migration scheme called PRO (Periodical Reset Optimized Page Migration Scheme), which could minimize the write operations in PCM with limited swap operations. PRO keeps a global request counter and gathers statistics about the numbers of accesses memory controller receives. Then, the global request counter controls to clear the dirty bits of the pages in swap candidate group to 0 every D memory accesses periodically. With the conclusion obtained

Experimental setup

In this section, we evaluate the performance of the proposed scheme with gem5-nvmain simulator which is a hybrid memory system simulator that could accurately characterize the performance of PCM and DRAM combination architecture[31]. Syscall Emulation (SE) mode with out of order CPU is adopted in gem5-nvmain simulator, which could guarantee the system running environment. In this experimental environment, detailed simulation configurations for main memory are listed in Table 2. The benchmarks

Conclusion

In this work, we delve into exploiting the data access features in the applications and conduct further research on inter-reference distance. It is found that most pages will be revisited in a certain inter-reference distance range. Using inter-reference distance ranging in a certain range could pick the hot accessed pages out. Then, basing on this observation, we introduce an efficient page replacement scheme called PRO (Periodical Reset Optimized Page Migration Scheme) for hybrid memory

Declaration of Competing Interest

We would like to submit the manuscript entitled “PRO: A Periodical Reset Optimized Page Migration Scheme For Hybrid Memory System”, which we wish to be considered for publication in “Journal of Systems Architecture”. No conflict of interest exists in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under

Acknowledgment

This work was supported by a grant from the National Natural Science Foundation of China (NSFC, no. 61504032).

Na Niu received the M.S. degree in University of Electronic Science and Technology of China, Chengdu, China, in 2016. She is currently working toward her PhD in Microelectronics Center, Harbin Institute of Technology. Her research is about hybrid memory design, very large-scale integration design and system on chips (SoC).

References (31)

  • S.-W. Cheng et al.

    Efficient warranty-aware wear leveling for embedded systems with pcm main memory

    IEEE Trans. Very Large Scale Integr. VLSI Syst.

    (2016)
  • L.E. Ramos et al.

    Page placement in hybrid memory systems

    Proceedings of the international conference on Supercomputing

    (2011)
  • S. Lee et al.

    Clock-dwf: a write-history-aware page replacement algorithm for hybrid pcm and dram memory architectures

    IEEE Trans. Comput.

    (2013)
  • M. Lee et al.

    M-clock: migration-optimized page replacement algorithm for hybrid dram and pcm memory architecture

    Proceedings of the 30th Annual ACM Symposium on Applied Computing

    (2015)
  • N. Niu et al.

    Wird: an efficiency migration scheme in hybrid dram and pcm main memory for image processing applications

    IEEE Access

    (2019)
  • Cited by (6)

    Na Niu received the M.S. degree in University of Electronic Science and Technology of China, Chengdu, China, in 2016. She is currently working toward her PhD in Microelectronics Center, Harbin Institute of Technology. Her research is about hybrid memory design, very large-scale integration design and system on chips (SoC).

    Fangfa Fu received the M.S. and Ph.D. degrees in microelectronics and solid-state electronics from Harbin Institute of Technology, Harbin, China, in 2007 and 2012, respectively. Since 2012, he has been giving lectures with the Microelectronics Center, Harbin Institute of Technology. His research interests include the system on chips (SoC), networks on chips (NoC), very large-scale integration design and digital signal processing.

    Bing Yang received the M.S. and Ph.D. degrees in microelectronics and solid-state electronics from Harbin Institute of Technology, Harbin, China, in 2002 and 2009, respectively. His research interests include the system on chips (SoC), very large-scale integration design and computer architecture.

    Jiacai Yuan received the B.S. degree in Harbin Institute of Technology, Harbin, China, in 2018; He is currently working toward his M.S degree in Microelectronics Center, Harbin Institute of Technology. His research is about hybrid memory design, very large-scale integration design and system on chips (SoC).

    Fengchang Lai received the B.S.degree in Harbin Institute of Technology, Harbin, China, in 1984 ;He is currently a Professor in the Microelectronics Center, Harbin Institute of Technology. His research interests are very large-scale integration design, SoC.

    Chengxin Zhao Chengxin Zhao received his M.S. degree in Electronic Engineering from Royal Institute of Technology, Sweden and his PhD degree in Electronic Engineering from University of Oslo, Norway, He is now a Professor at Institute of Modern Physics, Chinese Academy of Science. His research is about ASIC and electronics for nuclear experiments and medical applications.

    Jinxiang Wang received the B.S. and M.S. degrees in semiconductor physics and the Ph.D. degree in communication and information engineering from Harbin Institute of Technology, Harbin, China, in 1990, 1993, and 1999, respectively. He is currently a Professor with the Microelectronics Center, Harbin Institute of Technology. His research interests are very large scale integration design,wireless communication, SoC and NoC.

    View full text