Skip to main content
Log in

Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration

  • Published:
Journal of Electronic Testing Aims and scope Submit manuscript

Abstract

On top of the wear-out failures and external particle interventions, voltage scaling to mitigate the power consumption in multiprocessor makes cache more vulnerable to cell failures. For the indispensable voltage reduction to prolong the battery life of handheld devices, fault tolerance techniques are extremely important to ensure fault free execution in near-threshold voltage. Several fault tolerance techniques have been proposed and the remapping based techniques are found to be effective to address the issue of fault tolerance in single core systems. This work proposes an analytical model for remapping based fault tolerance techniques to evaluate the effectiveness of such schemes in multicore systems. The metrics Expected Miss Ratio in Multicore (EMRMC) and Expected Latency Ratio in Multicore (ELRMC), are introduced to characterize the behavior of remapping based techniques. The EMRMC and ELRMC are defined as the function of probability of cell failure (Pfail), block size, number of cores and threads. The system is simulated in Multi2sim 5.0, a multicore CPU-GPU simulator. The values of the metrics for different configuration parameters like probability of cell failure, number of cores, number of blocks, block size and number of threads are analysed for framing the guidelines of system configuration to deliver better performance in remapping based fault tolerance. It is observed that the EMRMC is proportional to Pfail and block size but inversely proportional to the number of cores and threads and it is not affected by the number of blocks. On the contrary, the ELRMC is inversely proportional to Pfail and block size and proportional to the number of cores and threads. It is also observed that the ELRMC is independent of the number of cores and blocks. EMRMC is best minimized for Pfail ≤ 1e-4, block size ≤ 64 bytes, number of cores ≥ 4 and number of threads ≥ 2. On the other hand, ELRMC is best observed for Pfail ≤ 1e-4, block size ≥ 64 bytes, number of cores ≥ 4 and number of threads 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. ISLPED

  2. Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. of the intl. symposium on low power electronics and design

  3. Ansari A, et al. (2011) Archipelago: a polymorphic cache design for enabling robust near-threshold operation. In: Proc of the international conference on computer architecture (HPCA), pp 539–550

  4. BanaiyanMofrad A, et al. (2011) FFT-Cache: a flexible fault-tolerant cache architecture for ultra low voltage operation. In: Proc. CASES

  5. BanaiyanMofrad A, et al. (2013) REMEDIATE: a scalable fault-tolerant architecture for low-power NUCA cache in tiled CMPs. In: Proc. of the international green computing conference (IGCC)

  6. Banaiyanmofrad A, Homayoun H, Dutt N (2015) Using a flexible fault- tolerant cache to improve reliability for ultra low voltage operation. ACM Trans Embedded Comput Syst 14(2):Article 32. Publication date: February 2015

    Article  Google Scholar 

  7. Calhoun B, Chandrakasan A (2006) A 256 kb subthreshold SRAM in 65nm CMOS. In: Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp 480–48

  8. Chen C, Hsiao M (1984) Error-correcting codes for semiconductor memory applications: a state of the art review. IBM J R & D

  9. Choudhury A, Sikdar BK Modeling and analysis of redundancy based fault tolerance for permanent faults in chip multiprocessor cache. In: Proceedings of the 31st international conference on VLSI design, VLSID 2018, ISSN-2380-6923, pp 115–120

  10. Choudhury A, Sikdar BK (2017) CIFR: a complete in-place fault remapping strategy for CMP cache for dynamic reuse distance. In: Proc. of the 7th International conference on embedded computing and system design, ISED

  11. Duong N, et al. (2012) Improving cache management policies using dynamic reuse distances. In: Proceedings of the 45th Annual IEEE/ACM international symposium on microarchitecture (MICRO)

  12. Kim J, Hardavellas N, Mai K, Falsafi B, Hoe JC (2007) Multi-bit error tolerant caches using two-dimensional error coding. In: Proc. 39th annual IEEE/ACM international symposium on microarchitecture (MICRO 39), pp 15–25

  13. Koh C-K, et al. (2009) The salvage cache: a fault-tolerant cache architecture for next-generation memory technologies. In: Proc. of the international conference on computer design (ICCD)

  14. Kulkarni JP, Kim K, Roy K (2007) A 160 mv, fully differential, robust Schmitt trigger based sub-threshold sram. In: Proc. of the 2007 international symposium on low power electronics and design. ACM, New York, pp 171–176

  15. Ladas N, Sazeides Y, Desmet V (2010) Performance-effective operation below Vcc-min. In: Proc of the intl symposium on performance analysis of systems & software

  16. Moradi F, Wisland D, Aunet S, Mahmoodi H, Cao T (2008) 65nm sub-threshold 11t-sram for ultra low voltage applications. In: Intl. symposium on system-on-a-chip, p 113118

  17. Morita Y, Fujiwara H, Noguchi H, Iguchi Y, Nii K, Kawaguchi H, Yoshimoto M (2007) An area-conscious low-voltage-oriented 8t-sram design under dvs environment. IEEE Symposium on VLSI circuits, pp. 256–257

  18. Ozdemir S, Sinha D, Memik G, Adams J, Zhou H (2006) Yield-aware cache architectures. In: Proc. of the international symposium on microarchitecture

  19. Pour F, Hill MD (1993) Performance implications of tolerating cache faults, Trans Comput

  20. Sa’nchez D, Sazeides Y, Cebria’n J, Garc’ia JM, Arago’ JLN (2013) Modeling the impact of permanent faults in caches. ACM Trans Arch Code Optim 10(4):Article 29. Publication date: December 2013

    Google Scholar 

  21. Sasan A, Homayoun H, Eltawil A, Kurdahi F (2009) A fault tolerant cache architecture for sub 500mv operation: resizable data composer cache (RDC-Cache). In: Proc. of international conference on compilers, architectures and synthesis for embedded systems (CASES)

  22. Skotnicki T, Hutchby J, King T-J, Wong H-S, Boeuf F (2005) The end of cmos scaling: toward the introduction of new materials and structural changes to improve mosfet performance. Circ Dev Mag IEEE 21(1):16

    Article  Google Scholar 

  23. Sohi G (1989) Cache memory organization to enhance the yield of high-performance VLSI processors. IEEE Trans. Computers 38(4):484–492

    Article  Google Scholar 

  24. Ubal R, Jang B, Mistry P, Schaa D, Kaeli D (2012) Multi2Sim: a simulation framework for CPU-GPU computing. In: Proc. of 21st international conference on parallel architectures and compilation techniques. Minneapolis

  25. Vergos HT, Nikolos D (1995) Performance recovery in direct- mapped faulty caches via the use of a very small fully associative spare cache. In: Proc. of the intl. computer performance and dependability symposium

  26. Wilkerson C et al (2008) Trading off cache capacity for reliability to enable low voltage operation. In: Proc. of international symposium on computer architecture (ISCA)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avishek Choudhury.

Additional information

Responsible Editor: C.-W. Wu

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choudhury, A., Sikdar, B.K. Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration. J Electron Test 36, 59–73 (2020). https://doi.org/10.1007/s10836-019-05852-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10836-019-05852-6

Keywords

Navigation