Skip to main content
Log in

A fast and accurate hybrid fault injection platform for transient and permanent faults

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

Many ground-level and space systems require reliability testing before their deployment, since they are increasingly susceptible to transient and permanent faults. Such process must be accurate, controllable, generic, cheap, and fast. Even though fault injection at gate-level is often the most appropriate solution when one seeks for accuracy and controllability, it is very time-consuming. Considering that, this work proposes a hybrid fault injection framework that automatically switches between RTL and gate-level simulation modes. By using a complex 8-issue VLIW processor as case-study, we show that the injection process can be accelerated by more than \(10\times \) for transient faults and almost 2 times for permanent faults over conventional injectors, while maintaining gate-level accuracy and controllability. The proposed framework is generic, so that faults can be injected into any arbitrary circuit, which is demonstrated by also injecting faults in a neural network and achieving a speedup of more than \(30\times \).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Beck ACS, Lisbôa CAL, Carro L (2012) Adaptable embedded systems. Springer, Heidelberg

    Google Scholar 

  2. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. SIGARCH Comput Archit News 39(2):1–7. https://doi.org/10.1145/2024716.2024718

    Article  Google Scholar 

  3. Binkert NL, Dreslinski RG, Hsu LR, Lim KT, Saidi AG, Reinhardt SK (2006) The M5 simulator: modeling networked systems. IEEE Micro 26(4):52–60. https://doi.org/10.1109/MM.2006.82

    Article  Google Scholar 

  4. Bolchini C, Sandionigi C (2010) Fault classification for SRAM-based FPGAs in the space environment for fault mitigation. IEEE Embed Syst Lett 2(4):107–110

    Article  Google Scholar 

  5. Cho H, Cher CY, Shepherd T, Mitra S (2015) Understanding soft errors in uncore components. In: Proceedings of the 52nd annual design automation conference, DAC, pp 89:1–89:6. ACM, New York, NY, USA. https://doi.org/10.1145/2744769.2744923

  6. Cho H, Mirkhani S, Cher CY, Abraham JA, Mitra S (2013) Quantitative evaluation of soft error injection techniques for robust system design. In: 50th ACM/EDAC/IEEE design automation conference (DAC), pp 1–10

  7. Ejlali A, Miremadi SG, Zarandi H, Asadi G, Sarmadi SB (2003) A hybrid fault injection approach based on simulation and emulation co-operation. In: Dependable systems and networks. Proceedings international conference on, pp 479–488. https://doi.org/10.1109/DSN.2003.1209958

  8. Erichsen AG, Sartor AL, Souza JD, Pereira MM, Wong S, Beck ACS (2018) ISA-DTMR: selective protection in configurable heterogeneous multicores. In: Voros N, Huebner M, Keramidas G, Goehringer D, Antonopoulos C, Diniz PC (eds) Applied reconfigurable computing. Architectures, tools, and applications. Springer International Publishing, Cham, pp 231–242

    Chapter  Google Scholar 

  9. Goswami KK (1997) DEPEND: a simulation-based environment for system level dependability analysis. IEEE Trans Comput 46(1):60–74. https://doi.org/10.1109/12.559803

    Article  Google Scholar 

  10. Gustafsson J, Betts A, Ermedahl A, Lisper B (2010) The Malardalen WCET benchmarks: past, present and future. WCET 15:136–146

    Google Scholar 

  11. Hari SKS, Adve SV, Naeimi H, Ramachandran P (2012) Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults. SIGPLAN Not 47(4):123–134. https://doi.org/10.1145/2248487.2150990

    Article  Google Scholar 

  12. Hauser J (2002) Berkeley SoftFloat. http://www.jhauser.us/arithmetic/SoftFloat.html

  13. Hsueh MC, Tsai TK, Iyer RK (1997) Fault injection techniques and tools. Computer 30(4):75–82

    Article  Google Scholar 

  14. Kalbarczyk Z, Iyer RK, Ries GL, Patel JU, Lee MS, Xiao Y (1999) Hierarchical simulation approach to accurate fault modeling for system dependability evaluation. IEEE Trans Softw Eng 25(5):619–632. https://doi.org/10.1109/32.815322

    Article  Google Scholar 

  15. Kaliorakis M, Tselonis S, Chatzidimitriou A, Gizopoulos D (2015) Accelerated microarchitectural fault injection-based reliability assessment. In: IEEE international symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFTS), pp 47–52. https://doi.org/10.1109/DFT.2015.7315134

  16. Kobayashi H, Usuki H, Shiraishi K, Tsuchiya H, Kawamoto N, Merchant G, Kase J (2004) Comparison between neutron-induced system-SER and accelerated-SER in SRAMs. In: Reliability physics symposium, 42nd annual IEEE international, pp 288–293. IEEE

  17. Kooli M, Natale GD, Bosio A (2016) Cache-aware reliability evaluation through LLVM-based analysis and fault injection. In: IEEE 22nd international symposium on on-line testing and robust system design (IOLTS), pp 19–22. https://doi.org/10.1109/IOLTS.2016.7604663

  18. Lesea A, Drimer S, Fabula JJ, Carmichael C, Alfke P (2005) The rosetta experiment: atmospheric soft error rate testing in differing technology FPGAs. IEEE Trans Device Mater Reliab 5(3):317–328

    Article  Google Scholar 

  19. Li ML, Ramachandran P, Karpuzcu UR, Hari SKS, Adve SV (2009) Accurate microarchitecture-level fault modeling for studying hardware faults. In: IEEE 15th international symposium on high performance computer architecture, pp 105–116. https://doi.org/10.1109/HPCA.2009.4798242

  20. Libano F, Rech P, Tambara L, Tonfat J, Kastensmidt F (2018) On the reliability of linear regression and pattern recognition feedforward artificial neural networks in FPGAs. IEEE Trans Nucl Sci 65(1):288–295. https://doi.org/10.1109/TNS.2017.2784367

    Article  Google Scholar 

  21. Magnusson PS, Christensson M, Eskilson J, Forsgren D, Hallberg G, Hogberg J, Larsson F, Moestedt A, Werner B (2002) Simics: a full system simulation platform. Computer 35(2):50–58. https://doi.org/10.1109/2.982916

    Article  Google Scholar 

  22. Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven mMultiprocessor simulator (GEMS) toolset. SIGARCH Comput Archit News 33(4):92–99. https://doi.org/10.1145/1105734.1105747

    Article  Google Scholar 

  23. Mukherjee SS, Weaver C, Emer J, Reinhardt SK, Austin T (2003) A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: Microarchitecture, 36th annual IEEE/ACM international symposium on, p 29. IEEE Computer Society

  24. Parasyris K, Tziantzoulis G, Antonopoulos CD, Bellas N (2014) GemFI: a fault injection tool for studying the behavior of applications on unreliable substrates. In: Dependable systems and networks (DSN), 44th annual IEEE/IFIP international conference on, pp 622–629. IEEE

  25. Patel A, Afram F, Chen S, Ghose K (2011) MARSS: a full system simulator for multicore x86 CPUs. In: Proceedings of the 48th design automation conference, DAC ’11, pp. 1050–1055. ACM, New York, NY, USA. https://doi.org/10.1145/2024724.2024954

  26. Ramachandran P, Kudva P, Kellington J, Schumann J, Sanda P (2008) Statistical fault injection. In: IEEE international conference on dependable systems and networks with FTCS and DCC (DSN), pp. 122–127. https://doi.org/10.1109/DSN.2008.4630080

  27. Sartor AL, Becker PHE, Beck ACS (2017) Simbah-FI: simulation-based hybrid fault injector. In: VII Brazilian symposium on computing systems engineering (SBESC), pp 94–101

  28. Sartor AL, Becker PHE, Hoozemans J, Wong S, Beck ACS (2018) Dynamic trade-off among fault tolerance, energy consumption, and performance on a multiple-issue VLIW processor. IEEE Trans Multi-Scale Comput Syst 4(3):327–339. https://doi.org/10.1109/TMSCS.2017.2760299

    Article  Google Scholar 

  29. Sartor AL, Lorenzon AF, Carro L, Kastensmidt F, Wong S, Beck A (2015) A novel phase-based low overhead fault tolerance approach for VLIW processors. In: VLSI (ISVLSI), IEEE Computer Society annual symposium on, pp 485–490. IEEE

  30. Sartor AL, Lorenzon AF, Carro L, Kastensmidt F, Wong S, Beck ACS (2017) Exploiting idle hardware to provide low overhead fault tolerance for VLIW processors. ACM J Emerg Technol Comput Syst 13(2):13:1–13:21. https://doi.org/10.1145/3001935

    Article  Google Scholar 

  31. Sartor AL, Lorenzon AF, Kundu S, Koren I, Beck ACS (2018) Adaptive and polymorphic VLIW processor to optimize fault tolerance, energy consumption, and performance. In: ACM international conference on computing frontiers, pp 54–61. ACM. https://doi.org/10.1145/3203217.3203238

  32. Sartor AL, Wong S, Beck ACS (2016) Adaptive ILP control to increase fault tolerance for VLIW processors. In: IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 9–16. https://doi.org/10.1109/ASAP.2016.7760767

  33. Scott J, Lee LH, Arends J, Moyer B (1998) Designing the low-power MCORE architecture. In: Power driven microarchitecture workshop, pp 145–150

  34. Shivakumar P, Kistler M, Keckler S, Burger D, Alvisi L (2002) Modeling the effect of technology trends on the soft error rate of combinational logic. In: Dependable systems and networks (DSN), International conf. on pp 389–398

  35. Violante M, Sterpone L, Manuzzato A, Gerardin S, Rech P, Bagatin M, Paccagnella A, Andreani C, Gorini G, Pietropaolo A (2007) Others: a new hardware/software platform and a new 1/E neutron source for soft error studies: testing FPGAs at the ISIS facility. IEEE Trans Nucl Sci 54(4):1184–1189

    Article  Google Scholar 

  36. Wind River: Simics - Supported Targets (2017). http://www.windriver.com/products/simics/simics-supported-targets.html

  37. Wong S, Van As T, Brown G (2008) \(\rho \)-VEX: a reconfigurable and extensible softcore VLIW processor. In: International conference on ICECE technology, pp 369–372. IEEE

  38. Yahagi Y, Saito Y, Terunuma K, Nunomiya T, Nakamura T (2002) Self-consistent integrated system for susceptibility to terrestrial neutron induced soft-error of sub-quarter micron memory devices. In: Integrated reliability workshop, IEEE international, pp 143–146. IEEE

  39. Yalcin G, Unsal OS, Cristal A, Valero M (2011) FIMSIM: a fault injection infrastructure for microarchitectural simulators. In: IEEE 29th international conference on computer design (ICCD), pp 431–432. https://doi.org/10.1109/ICCD.2011.6081435

Download references

Acknowledgements

This study was financed in part by: Pronex 16/0472-2; and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anderson L. Sartor.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sartor, A.L., Becker, P.H.E. & Beck, A.C.S. A fast and accurate hybrid fault injection platform for transient and permanent faults. Des Autom Embed Syst 23, 3–19 (2019). https://doi.org/10.1007/s10617-018-9217-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10617-018-9217-0

Keywords

Navigation