Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration

Choudhury, Avishek; Sikdar, Biplab K.

doi:10.1007/s10836-019-05852-6

Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration

Published: 18 February 2020

Volume 36, pages 59–73, (2020)
Cite this article

Journal of Electronic Testing Aims and scope Submit manuscript

132 Accesses
Explore all metrics

Abstract

On top of the wear-out failures and external particle interventions, voltage scaling to mitigate the power consumption in multiprocessor makes cache more vulnerable to cell failures. For the indispensable voltage reduction to prolong the battery life of handheld devices, fault tolerance techniques are extremely important to ensure fault free execution in near-threshold voltage. Several fault tolerance techniques have been proposed and the remapping based techniques are found to be effective to address the issue of fault tolerance in single core systems. This work proposes an analytical model for remapping based fault tolerance techniques to evaluate the effectiveness of such schemes in multicore systems. The metrics Expected Miss Ratio in Multicore (EMR_MC) and Expected Latency Ratio in Multicore (ELR_MC), are introduced to characterize the behavior of remapping based techniques. The EMR_MC and ELR_MC are defined as the function of probability of cell failure (P_fail), block size, number of cores and threads. The system is simulated in Multi2sim 5.0, a multicore CPU-GPU simulator. The values of the metrics for different configuration parameters like probability of cell failure, number of cores, number of blocks, block size and number of threads are analysed for framing the guidelines of system configuration to deliver better performance in remapping based fault tolerance. It is observed that the EMR_MC is proportional to P_fail and block size but inversely proportional to the number of cores and threads and it is not affected by the number of blocks. On the contrary, the ELR_MC is inversely proportional to P_fail and block size and proportional to the number of cores and threads. It is also observed that the ELR_MC is independent of the number of cores and blocks. EMR_MC is best minimized for P_fail ≤ 1e-4, block size ≤ 64 bytes, number of cores ≥ 4 and number of threads ≥ 2. On the other hand, ELR_MC is best observed for P_fail ≤ 1e-4, block size ≥ 64 bytes, number of cores ≥ 4 and number of threads 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

A Modern Primer on Processing in Memory

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Xingqi Zou, Sheng Xu, … Yinhe Han

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

Peter Thoman & Philip Salzmann

References

Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. ISLPED
Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. of the intl. symposium on low power electronics and design
Ansari A, et al. (2011) Archipelago: a polymorphic cache design for enabling robust near-threshold operation. In: Proc of the international conference on computer architecture (HPCA), pp 539–550
BanaiyanMofrad A, et al. (2011) FFT-Cache: a flexible fault-tolerant cache architecture for ultra low voltage operation. In: Proc. CASES
BanaiyanMofrad A, et al. (2013) REMEDIATE: a scalable fault-tolerant architecture for low-power NUCA cache in tiled CMPs. In: Proc. of the international green computing conference (IGCC)
Banaiyanmofrad A, Homayoun H, Dutt N (2015) Using a flexible fault- tolerant cache to improve reliability for ultra low voltage operation. ACM Trans Embedded Comput Syst 14(2):Article 32. Publication date: February 2015
Article Google Scholar
Calhoun B, Chandrakasan A (2006) A 256 kb subthreshold SRAM in 65nm CMOS. In: Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp 480–48
Chen C, Hsiao M (1984) Error-correcting codes for semiconductor memory applications: a state of the art review. IBM J R & D
Choudhury A, Sikdar BK Modeling and analysis of redundancy based fault tolerance for permanent faults in chip multiprocessor cache. In: Proceedings of the 31st international conference on VLSI design, VLSID 2018, ISSN-2380-6923, pp 115–120
Choudhury A, Sikdar BK (2017) CIFR: a complete in-place fault remapping strategy for CMP cache for dynamic reuse distance. In: Proc. of the 7th International conference on embedded computing and system design, ISED
Duong N, et al. (2012) Improving cache management policies using dynamic reuse distances. In: Proceedings of the 45th Annual IEEE/ACM international symposium on microarchitecture (MICRO)
Kim J, Hardavellas N, Mai K, Falsafi B, Hoe JC (2007) Multi-bit error tolerant caches using two-dimensional error coding. In: Proc. 39th annual IEEE/ACM international symposium on microarchitecture (MICRO 39), pp 15–25
Koh C-K, et al. (2009) The salvage cache: a fault-tolerant cache architecture for next-generation memory technologies. In: Proc. of the international conference on computer design (ICCD)
Kulkarni JP, Kim K, Roy K (2007) A 160 mv, fully differential, robust Schmitt trigger based sub-threshold sram. In: Proc. of the 2007 international symposium on low power electronics and design. ACM, New York, pp 171–176
Ladas N, Sazeides Y, Desmet V (2010) Performance-effective operation below Vcc-min. In: Proc of the intl symposium on performance analysis of systems & software
Moradi F, Wisland D, Aunet S, Mahmoodi H, Cao T (2008) 65nm sub-threshold 11t-sram for ultra low voltage applications. In: Intl. symposium on system-on-a-chip, p 113118
Morita Y, Fujiwara H, Noguchi H, Iguchi Y, Nii K, Kawaguchi H, Yoshimoto M (2007) An area-conscious low-voltage-oriented 8t-sram design under dvs environment. IEEE Symposium on VLSI circuits, pp. 256–257
Ozdemir S, Sinha D, Memik G, Adams J, Zhou H (2006) Yield-aware cache architectures. In: Proc. of the international symposium on microarchitecture
Pour F, Hill MD (1993) Performance implications of tolerating cache faults, Trans Comput
Sa’nchez D, Sazeides Y, Cebria’n J, Garc’ia JM, Arago’ JLN (2013) Modeling the impact of permanent faults in caches. ACM Trans Arch Code Optim 10(4):Article 29. Publication date: December 2013
Google Scholar
Sasan A, Homayoun H, Eltawil A, Kurdahi F (2009) A fault tolerant cache architecture for sub 500mv operation: resizable data composer cache (RDC-Cache). In: Proc. of international conference on compilers, architectures and synthesis for embedded systems (CASES)
Skotnicki T, Hutchby J, King T-J, Wong H-S, Boeuf F (2005) The end of cmos scaling: toward the introduction of new materials and structural changes to improve mosfet performance. Circ Dev Mag IEEE 21(1):16
Article Google Scholar
Sohi G (1989) Cache memory organization to enhance the yield of high-performance VLSI processors. IEEE Trans. Computers 38(4):484–492
Article Google Scholar
Ubal R, Jang B, Mistry P, Schaa D, Kaeli D (2012) Multi2Sim: a simulation framework for CPU-GPU computing. In: Proc. of 21st international conference on parallel architectures and compilation techniques. Minneapolis
Vergos HT, Nikolos D (1995) Performance recovery in direct- mapped faulty caches via the use of a very small fully associative spare cache. In: Proc. of the intl. computer performance and dependability symposium
Wilkerson C et al (2008) Trading off cache capacity for reliability to enable low voltage operation. In: Proc. of international symposium on computer architecture (ISCA)

Download references

Author information

Authors and Affiliations

New Alipore College, Kolkata, India
Avishek Choudhury
IIEST, Shibpur, Howrah, India
Biplab K. Sikdar

Authors

Avishek Choudhury
View author publications
You can also search for this author in PubMed Google Scholar
Biplab K. Sikdar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avishek Choudhury.

Additional information

Responsible Editor: C.-W. Wu

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choudhury, A., Sikdar, B.K. Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration. J Electron Test 36, 59–73 (2020). https://doi.org/10.1007/s10836-019-05852-6

Download citation

Received: 11 August 2019
Accepted: 16 December 2019
Published: 18 February 2020
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10836-019-05852-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration

Abstract

Access this article

Similar content being viewed by others

A Modern Primer on Processing in Memory

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration

Abstract

Access this article

Similar content being viewed by others

A Modern Primer on Processing in Memory

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation