A Study on Modeling and Optimization of Memory Systems

Liu, Jason; Espina, Pedro; Sun, Xian-He

doi:10.1007/s11390-021-0771-8

A Study on Modeling and Optimization of Memory Systems

Regular Paper
Published: 30 January 2021

Volume 36, pages 71–89, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Jason Liu¹,
Pedro Espina¹ &
Xian-He Sun²

313 Accesses
6 Citations
Explore all metrics

Abstract

Accesses Per Cycle (APC), Concurrent Average Memory Access Time (C-AMAT), and Layered Performance Matching (LPM) are three memory performance models that consider both data locality and memory assess concurrency. The APC model measures the throughput of a memory architecture and therefore reflects the quality of service (QoS) of a memory system. The C-AMAT model provides a recursive expression for the memory access delay and therefore can be used for identifying the potential bottlenecks in a memory hierarchy. The LPM method transforms a global memory system optimization into localized optimizations at each memory layer by matching the data access demands of the applications with the underlying memory system design. These three models have been proposed separately through prior efforts. This paper reexamines the three models under one coherent mathematical framework. More specifically, we present a new memory- centric view of data accesses. We divide the memory cycles at each memory layer into four distinct categories and use them to recursively define the memory access latency and concurrency along the memory hierarchy. This new perspective offers new insights with a clear formulation of the memory performance considering both locality and concurrency. Consequently, the performance model can be easily understood and applied in engineering practices. As such, the memory-centric approach helps establish a unified mathematical foundation for model-driven performance analysis and optimization of contemporary and future memory systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20-24. https://doi.org/10.1145/216585.216588.
Article Google Scholar
Denning P J. The working set model for program behavior. In Proc. the 1st ACM Symposium on Operating System Principles, October 1967, Article No. 15. https://doi.org/10.1145/357980.357997.
Denning P J. The locality principle. In Communication Networks and Computer Systems: A Tribute to Professor Erol Gelenbe, Barria G A (ed.), London, Imperial College Press, 2006, pp.43-67.
Chou Y, Fahs B, Abraham S G. Microarchitecture optimizations for exploiting memory-level parallelism. In Proc. the 31st Annual International Symposium on Computer Architecture, June 2004, pp.76-87. https://doi.org/10.1109/ISCA.2004.1310765.
Sun X H, Wang D W. Concurrent average memory access time. Computer, 2014, 47(5): 74-80. https://doi.org/10.1109/MC.2013.227.
Article Google Scholar
Wang D W, Sun X H. APC: A novel memory metric and measurement methodology for modern memory systems. IEEE Transactions on Computers, 2014, 63(7): 1626-1639. https://doi.org/10.1109/TC.2013.38.
Article MathSciNet MATH Google Scholar
Liu Y, Sun X. LPM: A systematic methodology for concurrent data access pattern optimization from a matching perspective. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(11): 2478-2493. https://doi.org/10.1109/TPDS.2019.2912573.
Article Google Scholar
Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (5th edition). Morgan Kaufmann, 2011.
Tuck J, Ceze L, Torrellas J. Scalable cache miss handling for high memory-level parallelism. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, pp.409-422. https://doi.org/10.1109/MICRO.2006.44.
Lim K, Turner Y, Santos J R, AuYoung A, Chang J, Ranganathan P, Wenisch T F. System-level implications of disaggregated memory. In Proc. the 2012 IEEE International Symposium on High-Performance Comp Architecture, Feb. 2012, pp.189-200. https://doi.org/10.1109/HPCA.2012.6168955.
Gao P X, Narayan A, Karandikar S, Carreira J, Han S, Agarwal R, Ratnasamy S, Shenker S. Network requirements for resource disaggregation. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.249-264. https://doi.org/10.5555/3026877.3026897.
Zhang N, Toonen B, Sun X H, Allcock B. Performance modeling and evaluation of a production disaggregated memory system. In Proc. the 2020 International Symposium on Memory Systems, Sept. 28–Oct. 2. 2020.
Zhang N, Jiang C T, Sun X H, Song S. Evaluating GPGPU memory performance through the C-AMAT model. In Proc. the Workshop on Memory Centric Programming for HPC, Nov. 2017, pp.35-39. https://doi.org/10.1145/3145617.3158214.
Sun X H, Ni L M. Another view on parallel speedup. In Proc. the 1990 ACM/IEEE Conference on Supercomputing, November 1990, pp.324-333. https://doi.org/10.1109/SU-PERC.1990.130037.
Mattson R L, Gecsei J, Slutz D R, Traiger I L. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970, 9(2): 78-117. https://doi.org/10.1147/sj.92.0078.
Article MATH Google Scholar
Weinberg J, McCracken M O, Strohmaier E, Snavely A. Quantifying locality in the memory access patterns of HPC applications. In Proc. the 2005 ACM/IEEE Conference on Supercomputing, November 2005, Article No. 50. https://doi.org/10.1109/SC.2005.59.
Berg E, Hagersten E. Fast data-locality profiling of native execution. In Proc. the International Conference on Measurements and Modeling of Computer Systems, June 2005, pp.169-180. https://doi.org/10.1145/1071690.1064232.
Gu X M, Christopher I, Bai T X, Zhang C L, Ding C. A component model of spatial locality. In Proc. the 8th International Symposium on Memory Management, June 2009, pp.99-108. https://doi.org/10.1145/1542431.1542446.
Anghel A, Dittmann G, Jongerius R, Luijten R. Spatiotemporal locality characterization. In Proc. the 1st Workshop on Near Data Processing, December 2013.
Ding C, Xiang X Y. A higher order theory of locality. In Proc. the 2012 ACM SIGPLAN Workshop on Memory System Performance Correctness, June 2012, pp.68-69. https://doi.org/10.1145/2247684.2247697.
Ding C, Zhong Y T. Predicting whole-program locality through reuse distance analysis. In Proc. the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2003, pp.245-257. https://doi.org/10.1145/781131.781159.
Jiang Y L, Zhang E Z, Tian K, Shen X P. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proc. the 19th International Conference on Compiler Construction, March 2010, pp.264-282. https://doi.org/10.1007/978-3-642-11970-5_15.
Gupta S, Xiang P, Yang Y, Zhou H Y. Locality principle revisited: A probability-based quantitative approach. Journal of Parallel and Distributed Computing, 2013, 73(7): 1011-1027. https://doi.org/10.1016/j.jpdc.2013.01.010.
Article Google Scholar
Liu Y H, Sun X H. CaL: Extending data locality to consider concurrency for performance optimization. IEEE Transactions on Big Data, 2017, 4(2): 273-288. https://doi.org/10.1109/TB-DATA.2017.2753825.
Article Google Scholar
Glew A. MLP yes! ILP no. In Proc. the ASPLOS Wild and Crazy Idea Session, October 1998.
Sorin D J, Pai V S, Adve S, Vernon M K, Wood D A. Analytic evaluation of shared-memory systems with ILP processors. In Proc. the 25th Annual International Symposium on Computer Architecture, June 1998, pp.380-391. https://doi.org/10.1109/ISCA.1998.694797.
Gray J, Shenoy P. Rules of thumb in data engineering. In Proc. the 16th International Conference on Data Engineering, March 2000, pp.3-10. https://doi.org/10.1109/ICDE.2000.839382.
Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 2009, 52(4): 65-76. https://doi.org/10.1145/1498765.1498785.
Article Google Scholar
Zhu MF, Xiao L M, Ruan L, Hao Q F. DeepComp: Towards a balanced system design for high performance computer systems. Front. Comput. Sci. China, 2010, 4(4): 475-479. https://doi.org/10.1007/s11704-010-0150-z.

Download references

Acknowledgments

The authors would like to thank the reviewers for their constructive comments and suggestions.

Author information

Authors and Affiliations

School of Computing and Information Sciences, Florida International University, Miami, FL, 33199, USA
Jason Liu & Pedro Espina
Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
Xian-He Sun

Authors

Jason Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Espina
View author publications
You can also search for this author in PubMed Google Scholar
Xian-He Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason Liu.

Supplementary Information

ESM 1

(PDF 173 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Espina, P. & Sun, XH. A Study on Modeling and Optimization of Memory Systems. J. Comput. Sci. Technol. 36, 71–89 (2021). https://doi.org/10.1007/s11390-021-0771-8

Download citation

Received: 02 July 2020
Accepted: 19 November 2020
Published: 30 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11390-021-0771-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Modeling and Optimization of Memory Systems

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

A Modern Primer on Processing in Memory

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

A Modern Primer on Processing in Memory

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation