Skip to main content

Advertisement

Log in

Improving power-performance via hybrid cache for chip many cores based on neural network prediction technique

  • Technical Paper
  • Published:
Microsystem Technologies Aims and scope Submit manuscript

Abstract

Recently, the increasing need to run applications for significant data analytics, and the augmented demand of useful tools for big data computing systems has resulted in a cumulative necessity for efficient platforms with high performance and realizable power consumption, for example, chip multiprocessors (CMPs). Correspondingly, due to the demand for features like shrinkable sizes, and the concurrent need to pack increasing numbers of transistors into a single chip, has led to serious design challenges, consuming a significant of power within high area densities. We present a reconfigurable hybrid cache system for last level cache (LLC) by the integration of emerging designs, such as STT-RAM with SRAM memories. This approach consists of two phases: off- time and on-time. In off time, training NN is implemented while in the on-time phase, a reconfiguration cache uses a neural network (NN) learning approach to predict demanded latency of the running application. Experimental results of a three-dimensional chip with 64 cores show that the suggested design under PARSEC benchmarks provides a speedup in terms of the performance at 25% and improves energy consumption by 78.4% in comparison to non-reconfigurable pure SRAM cache architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • AL-Obaidy F, Asad A, Mohammadi F (2019) Reconfigurable hybrid cache hierarchy in 3D chip-multi processors based on a convex optimization method. In: 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Edmonton, AB, Canada, 2019, pp. 1–6. https://doi.org/10.1109/CCECE.2019.8861876

  • Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), Toronto, ON, Canada, 2008, pp. 72–81

  • Binkert N, Beckmann B et al (2011) The gem5 simulator. ACM SIGARCH Comput Architect News 39(2):1. https://doi.org/10.1145/2024716.202471

    Article  Google Scholar 

  • Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2015) Noxim: an open, extensible and cycle-accurate network on chip simulator. In: 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Toronto, ON, 2015, pp. 162–163. https://doi.org/10.1109/ASAP.2015.7245728

  • Chen Y, Wong W, Li H, Koh C (2011) Processor caches built using multi-level spin-transfer torque RAM cells. In: IEEE/ACM International Symposium on Low Power Electronics and Design, Fukuoka, 2011, pp. 73–78. https://doi.org/10.1109/ISLPED.2011.5993610

  • Dinakarrao J, Pudukotai SM, Joseph A, Haridass A, Shafique M, Henkel J, Homayoun H (2019) Application and thermal-reliability-aware reinforcement learning based multi-core power management. ACM J Emerg Tech Comp Syst (JETC) 15(4):1–19

    Article  Google Scholar 

  • Ge F, Wang L, Lu H (2019) STT-RAM based energy-efficient hybrid cache architecture for 3D chip multiprocessors. EngLett 27(1)

  • Hasan M (2014) Multi-core architectures for feed-forward neural networks. PhD diss., University of Dayton

  • Ïpek E, Mckee SA, Caruana R, Supinski BRD, Schulz M (2006) Efficiently exploring architectural design spaces via predictive modeling. ACM SIGPLAN Notices 41(11):195–206

    Article  Google Scholar 

  • Ipek E, McKee SA et al (2008) Efficient architectural design space exploration via predictive modeling. ACM Trans Architect Code Optim (TACO) 4(4):1–34

    Article  Google Scholar 

  • Joseph PJ, Vaswani K, Thazhuthaveetil MJ (2006) Construction and use of linear regression models for processor performance analysis. In: The Twelfth International Symposium on High-Performance Computer Architecture, 2006, Austin, TX, 2006, pp. 99–108. https://doi.org/10.1109/HPCA.2006.1598116

  • Karpuzcu UR, Sinkar A, Kim NS, Torrellas J (2013) Energy smart: toward energy-efficient many cores for near-threshold computing. In: Proc. HPCA, pp. 542–553, IEEE

  • Klug T, Ott M, Weidendorfer J, Trinitis C (2011) AUTOPIN–automated optimization of thread-to-core pinning on multicore systems. In: Transactions on high-performance embedded architectures and compilers III, pp. 219–235. Springer, Berlin

  • Ma K, Wang X, Wang Y (2014) DPPC: dynamic power partitioning and control for improved chip multiprocessor performance. IEEE Trans Comput 63(7):1736–1750. https://doi.org/10.1109/TC.2013.67

    Article  MathSciNet  MATH  Google Scholar 

  • Mishra SP, Sarkar U (2017c) Multivariate statistical data analysis-principal component analysis (PCA). Int J Liv Res 7(5):60–78

    Google Scholar 

  • Mittal S, Wang R, Vetter J (2017) Destiny: a comprehensive tool with 3d and multi-level cell memory modeling capability. J Low Power Electron App 7(3):23

    Article  Google Scholar 

  • Modarressi M, Asadinia M, Sarbazi-Azad H (2013) Using task migration to improve non-contiguous processor allocation in NoC-based CMPs. J Syst Arch 59(7):468–481

    Article  Google Scholar 

  • Pagani S, Manoj PDS, Jantsch A, Henkel J (2020) Machine learning for power, energy, and thermal management on multicore processors: a Survey. IEEE Trans Comput Aided Des Integr Circuits Syst 39(1):101–116. https://doi.org/10.1109/TCAD.2018.2878168

    Article  Google Scholar 

  • Rhinehart RR (2018) Engineering optimization: applications, methods and analysis. Wiley, USA

    Book  Google Scholar 

  • Shen H (2014) Adaptive power management for computers and mobile devices. Ph.D. Dissertation, Syracuse University, USA

  • Sundararajan KT, Jones TM, Topham N (2011) Smart cache: a self-adaptive cache architecture for energy efficiency. In: 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, Samos, 2011, pp. 41–50. https://doi.org/10.1109/SAMOS.2011.6045443

  • Turakhia Y, Raghunathan B, Garg S, Marculescu D (2013) HaDeS: architectural synthesis for heterogeneous dark silicon chip multi-processors. In: 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, 2013, pp. 1–7. https://doi.org/10.1145/2463209.2488948

  • Wang L, Skadron K (2012) Dark vs. dim silicon and near-threshold computing extended results. University of Virginia Department of Computer Science Technical Report TR-2013 1

  • Wang J, Chen Z, Guo J, Li Y, Lu Z (2017) ACO-based thermal-aware thread-to-core mapping for dark-silicon-constrained CMPs. IEEE Trans Electron Devices 64(3):930–937. https://doi.org/10.1109/TED.2017.2653838

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Furat Al-Obaidy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Obaidy, F., Asad, A. & Mohammadi, F.A. Improving power-performance via hybrid cache for chip many cores based on neural network prediction technique. Microsyst Technol 27, 2995–3006 (2021). https://doi.org/10.1007/s00542-020-05048-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00542-020-05048-5

Navigation