Skip to main content
Log in

Sensitivity of computational fluid dynamics simulations against soft errors

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Computational capabilities of the largest high performance computing systems have increased by more than 100 folds in the last 10 years and keep increasing substantially every year. This increase is made possible mostly by multi-core technology besides the increase in clock speed of CPUs. Nowadays, there are systems with more than 100 thousand cores installed and available for processing simultaneously. Computational simulation tools are always in need of more than available computational sources. This is the case for especially complex, large scale flow problems. For these large scale problems, the soft error tolerance of the simulation codes should also be encountered where it is not an issue in relatively small scale problems due to the low occurrence probabilities. In this study, we analyzed the reaction of an incompressible flow solver to randomly generated soft errors at several levels of computation. Soft errors are induced into the final global assembly matrix of the solver by manipulating predetermined bit-flip operations. Behaviour of the computational fluid dynamics (CFD) solver is observed after iterative matrix solver, flow convergence and CFD iterations. Results show that the iterative solvers of CFD matrices are highly sensitive to customized soft errors while the final solutions seem more intact to bit-flip operations. But, the solutions might still differ from the real physical results depending on the bit-flip location and iteration number. So, the next generation computing platforms and codes should be designed to be able to detect bit-flip operations and be designed bit-flip resistant.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Adiga NR, Almasi G, et al (2002) An overview of the bluegene/l supercomputer. In: SC ’02: Proceedings of the 2002 ACM/IEEE conference on supercomputing, pp 60–60

  2. Agullo E, Giraud L, Guermouche A, Roman J, Zounon M (2016) Numerical recovery strategies for parallel resilient krylov linear solvers. Numer Linear Algebra Appl 23(5):888–905

    Article  MathSciNet  Google Scholar 

  3. Agullo E, Cools S, Giraud L, Moreau A, Salas P, Vanroose W, Yetkin EF, Zounon M (2017) Hard faults and soft-errors: possible numerical remedies in linear algebra solvers. In: Dutra I, Camacho R, Barbosa J, Marques O (eds) High performance computing for computational science - VECPAR 2016. Springer, Cham, pp 11–18

    Chapter  Google Scholar 

  4. Agullo E, Cools S, Yetkin EF, Giraud L, Vanroose W (2018) On soft errors in the conjugate gradient method: sensitivity and robust numerical detection. Research Report RR-9226, Inria Bordeaux Sud-Ouest

  5. Agullo E, Cools S, Yetkin EF, Giraud L, Schenkels N, Vanroose W (2020) On soft errors in the conjugate gradient method: sensitivity and robust numerical detection. SIAM J Sci Comput 42(6):C335–C358

    Article  MathSciNet  Google Scholar 

  6. Alvarez X, Gorobets A, Trias F, Borrell R, Oyarzun G (2018) Hpc2-a fully-portable, algebra-based framework for heterogeneous computing. application to CFD. Comput Fluids 173:285–292

    Article  MathSciNet  Google Scholar 

  7. Arnaz A, Piskin S, Oguz GN, Yalcinbas Y, Pekkan K, Saroglu T (2018) Effect of modified Blalock–Taussig shunt anastomosis angle and pulmonary artery diameter on pulmonary flow. Anatol J Cardiol 20(1):2–8

    Google Scholar 

  8. Avižienis A, Laprie JC, Randell B, Landwehr C (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33

    Article  Google Scholar 

  9. Bautista-Gomez L, Cappello F (2015) Detecting silent data corruption for extreme-scale MPI applications. In: Proceedings of the 22nd European MPI users’ group meeting, association for computing machinery, New York, NY, USA, EuroMPI ’15

  10. Benson AR, Schmit S, Schreiber R (2015) Silent error detection in numerical time-stepping schemes. Int J High Perform Comput Appl 29(4):403–421

    Article  Google Scholar 

  11. Berrocal E, Bautista-Gomez L, Di S, Lan Z, Cappello F (2015) Lightweight silent data corruption detection based on runtime data analysis for hpc applications. In: Proceedings of the 24th international symposium on high-performance parallel and distributed computing, Association for Computing Machinery, New York, NY, USA, HPDC ’15, pp 275–278

  12. Bronevetsky G, de Supinski B (2008) Soft error vulnerability of iterative linear algebra methods. In: Proceedings of the 22nd annual international conference on Supercomputing, pp 155–164

  13. Bronevetsky G, de Supinski B, Schulz M (2009) A foundation for the accurate prediction of the soft error vulnerability of scientific applications. In: IEEE workshop on silicon errors in logic - system effects, Stanford, CA, United States

  14. Calmet H, Gambaruto AM, Bates AJ, Vázquez M, Houzeaux G, Doorly DJ (2016) Large-scale CFD simulations of the transitional and turbulent regime for the large human airways during rapid inhalation. Comput Biol Med 69:166–180

    Article  Google Scholar 

  15. Cappello F, Geist A, Gropp W, Kale S, Kramer B (2014) Toward exascale resilience: 2014 Update 2. The Exascale Resilience Problem. Technical Report p 1

  16. Carson E, Strakoš Z (2020) On the cost of iterative computations. Philos Trans R Soc A Math Phys Eng Sci 378:20190050. https://doi.org/10.1098/rsta.2019.0050

    Article  MathSciNet  MATH  Google Scholar 

  17. Chen L, Ebrahimi M, Tahoori MB (2016) Reliability-aware resource allocation and binding in high-level synthesis. ACM Trans Des Autom Electron Syst 21(2)

  18. Cools S (2019) Analyzing and improving maximal attainable accuracy in the communication hiding pipelined bicgstab method. Parallel Comput 86:16–35

    Article  MathSciNet  Google Scholar 

  19. Cools S, Yetkin EF, Agullo E, Giraud L, Vanroose W (2018) Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined conjugate gradient method. SIAM J Matrix Anal Appl 39(1):426–450

    Article  MathSciNet  Google Scholar 

  20. Du P, Luszczek P, Dongarra J (2012) High performance dense linear system solver with soft error resilience. In: Proceedings of the international conference on computational science, pp 216–225

  21. Einstein A (1905) Zur Elektrodynamik bewegter Körper. (German) [On the electrodynamics of moving bodies]. Annalen der Physik 322(10):891–921

  22. Elliott J, Hoemmen M, Mueller F (2016) Exploiting data representation for fault tolerance. J Comput Sci 14:51–60, the Route to Exascale: Novel Mathematical Methods, Scalable Algorithms and Computational Science Skills

  23. Fiala D, Mueller F, Engelmann C, Riesen R, Ferreira K, Brightwell R (2012) Detection and correction of silent data corruption for large-scale high-performance computing. In: SC ’12: Proceedings of the international conference on high performance computing, networking, storage and analysis, pp 1–12

  24. Garcia-Gasulla M, Mantovani F, Josep-Fabrego M, Eguzkitza B, Houzeaux G Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. Int J High Perform Comput Appl 0(0):1094342019842919

  25. Ghysels P, Vanroose W (2014) Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput 40(7):224–238

    Article  MathSciNet  Google Scholar 

  26. Howard M, Fisher T, Hoemmen M, Dinzl D, Overfelt J, Bradley A, Kim K, Rajamanickam S (2018) Employing multiple levels of parallelism for CFD at large scales on next generation high-performance computing platforms. In: Editor T (ed) Tenth international conference on computational fluid dynamics (ICCFD10), The organization, Barcelona, Spain, an optional note

  27. Huang K, Abraham J (1984) Algorithm-based fault tolerance for Matnx operations. IEEE Trans Comput c(6):518–528

  28. Hwang AA, Stefanovici IA, Schroeder B (2012) Cosmic rays don’t strike twice: understanding the nature of dram errors and the implications for system design. In: Proceedings of the seventeenth international conference on architectural support for programming languages and operating systems, Association for Computing Machinery, New York, NY, USA, ASPLOS XVII, pp 111–122

  29. Jaulmes L, Casas M, Moretó M, Ayguadé E, Labarta J, Valero M (2015) Exploiting asynchrony from exact forward recovery for due in iterative solvers. In: SC ’15: Proceedings of the international conference for high performance computing, networking, storage and analysis, pp 1–12

  30. Khawaja H (2019 (accessed May 15, 2020)a) CFD solution using SIMPLE. https://www.mathworks.com/matlabcentral/fileexchange/66129-matlab

  31. Khawaja H (2019 (accessed May 15, 2020)b) SIMPLE code rectengular. https://github.com/hassan-khawaja/matlab

  32. Khawaja H, Moatamedi M (2018) Semi-implicit method for pressure-linked equations (simple) - solution in matlab\(\textregistered \). Int J Multiphys 12(4)

  33. Lashkarinia S, Piskin S, Bozkaya TA, Salihoglu E, Yerebakan C, Pekkan K (2018) Computational pre-surgical planning of arterial patch reconstruction: parametric limits and in vitro validation. Ann Biomed Eng 46:1292–1308

    Article  Google Scholar 

  34. Lee S, Kevrekidis IG, Karniadakis GE (2017) A general CFD framework for fault-resilient simulations based on multi-resolution information fusion. J Comput Phys 347:290–304

    Article  MathSciNet  Google Scholar 

  35. Lienig J, Bruemmer H (2017) Reliability analysis. Springer, Cham, pp 45–73

    Google Scholar 

  36. Oguz GN, Piskin S, Ermek E, Donmazov S, Altekin N, Arnaz A, Pekkan K (2017) Increased energy loss due to twist and offset buckling of the total cavopulmonary connection. J Med Devices 11(2):021012

  37. Piskin S, Celebi MS (2013) Analysis of the effects of different pulsatile inlet profiles on the hemodynamical properties of blood flow in patient specific carotid artery with stenosis. Comput Biol Med 43(6):717–728

    Article  Google Scholar 

  38. Piskin S, Ündar A, Pekkan K (2015) Computational modeling of neonatal cardiopulmonary bypass hemodynamics with full circle of willis anatomy. Artif Organs 39(10):E164–E175

    Article  Google Scholar 

  39. Piskin S, Altin HF, Yildiz O, Bakir I, Pekkan K (2017a) Hemodynamics of patient-specific aorta-pulmonary shunt configurations. J Biomech 50:166–171, biofluid mechanics of multitude pathways: From cellular to organ

  40. Piskin S, Unal G, Arnaz A, Sarioglu T, Pekkan K (2017b) Tetralogy of fallot surgical repair: shunt configurations, ductus arteriosus and the circle of Willis. Cardiovasc Eng Technol 8:107–119

    Article  Google Scholar 

  41. Piskin S, Patnaik SS, Han D, Bordones AD, Murali S, Finol EA (2020) A canonical correlation analysis of the relationship between clinical attributes and patient-specific hemodynamic indices in adult pulmonary hypertension. Med Eng Phys 77:1–9

    Article  Google Scholar 

  42. Roy S (2019) LES and DNS of multiphase flows in industrial devices: application of high-performance computing. Springer, Singapore, pp 223–247

    Google Scholar 

  43. Shang Z (2014) Impact of mesh partitioning methods in CFD for large scale parallel computing. Comput Fluids 103:1–5

    Article  Google Scholar 

  44. Shantharam M, Srinivasmurthy S, Raghavan P (2011) Characterizing the impact of soft errors on iterative methods in scientific computing. In: Proceedings of the international conference on supercomputing - ICS ’11 p 152

  45. Snir M, Wisniewski RW, Ja Abraham, Adve SV, Bagchi S, Balaji P, Belak J, Bose P, Cappello F, Carlson B, Aa Chien, Coteus P, Na DeBardeleben, Diniz PC, Engelmann C, Erez M, Fazzari S, Geist A, Gupta R, Johnson F (2014) Addressing failures in exascale computing. Int J High Perform Comput Appl 28:129–173

    Article  Google Scholar 

  46. Ugurel E, Piskin S, Aksu AC, Eser A, Yalcin O (2020) From experiments to simulation: shear-induced responses of red blood cells to different oxygen saturation levels. Front Physiol 10:1559

    Article  Google Scholar 

  47. van der Vorst HA (2009) Iterative Krylov methods for large linear systems. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  48. Wang F, Agrawal VD (2008) Single event upset: an embedded tutorial. In: Proceedings of the IEEE international frequency control symposium and exposition pp 429–434

  49. Wang YX, Zhang LL, Liu W, Cheng XH, Zhuang Y, Chronopoulos AT (2018) Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores. Comput Fluids 173:226–236

    Article  MathSciNet  Google Scholar 

Download references

Funding

This research did not receive any specific grant from any funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to E. Fatih Yetkin or Şenol Pişkin.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yetkin, E.F., Pişkin, Ş. Sensitivity of computational fluid dynamics simulations against soft errors. Computing 103, 2687–2709 (2021). https://doi.org/10.1007/s00607-021-00976-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-021-00976-0

Keywords

Mathematics Subject Classification

Navigation