Skip to main content
Log in

Anti-aging analysis for software reliability design modes in the context of single-event effect

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Software aging is a risk associated with the continuous operation of software, and it is essential and meaningful to develop anti-aging technology to offset or mitigate the aging phenomenon. While considerable attention has been devoted to software aging and anti-aging techniques, few studies have focused on single-event effect as a software-aging reason in the context of a space environment. In this study, aiming at the software-aging problem caused by the specific reason above, besides the classic software rejuvenation, we further explore the anti-aging effects and rules of software reliability design modes, including triple modular redundancy (TMR) and logical partitioning. Reliability and availability are used as aging indicators, and the anti-aging effect of reliability design modes and rejuvenation policy is quantitatively analyzed through probabilistic model checking. The simulation and theoretical results show that the reliability design mode can alleviate software aging. However, the TMR mode is time-sensitive. It is found that the application of the rejuvenation policy makes time-sensitivity disappear. A combination of reliability design modes and rejuvenation policy can obtain the best anti-aging effect. The analysis and discussion in this paper can provide useful insights for software researchers to instantiate different software anti-aging inventions or new applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Abbreviations

CTMC:

Continuous-time Markov chain

TMR :

Triple modular redundancy

SEE :

Single-event effect

λ :

SEE intensity

n :

Number of partitions

SI :

Scrub interval

T :

Observation time

succ :

State succ indicates that the software runs normally and the system performance is optimal

aging :

State aging means software aging and system performance degradation

down :

State down means software failure

None :

The None mode described in this paper refers to a software system that does not apply any reliability design mode

TMR :

The TMR mode refers to the software system using TMR technology

TMR_Partition :

The TMR_Partition mode refers to a software system that applies TMR and logical partition technology simultaneously

None_Rejuvenation :

The None_Rejuvenation mode refers to the software system that uses rejuvenation technology alone

TMR_Rejuvenation :

The TMR_Rejuvenation mode refers to the software system that uses TMR and rejuvenation technology

TMR_Rejuvenation_Partition :

The TMR_Rejuvenation_Partition mode refers to the software system using three technologies simultaneously

References

  • Adell, P., & Allen, G. (2008). Assessing and mitigating radiation effects in Xilinx FPGAs. Pasadena, CA: Jet Propulsion Laboratory, California Institute of Technology.

    Book  Google Scholar 

  • Andrzejak, A., & Silva, L. (2007). Deterministic models of software aging and optimal rejuvenation schedules. In Integrated Network Management, 2007. IM’07. 10th IFIP/IEEE International Symposium on (pp. 159–168). IEEE.

  • Avritzer, A., Cole, R. G., & Weyuker, E. J. (2010). Methods and opportunities for rejuvenation in aging distributed software systems. Journal of Systems and Software, 83(9), 1568–1578.

    Article  Google Scholar 

  • Bolchini, C., & Sandionigi, C. (2010). Fault classification for SRAM-based FPGAs in the space environment for fault mitigation. IEEE Embedded Systems Letters, 2(4), 107–110.

    Article  Google Scholar 

  • Cao, J., & Cheng, K. (2006). Introduction to reliability mathematics (pp. 182–228). Beijing: Higher Education Press.

    Google Scholar 

  • Clarke, E. M., & Emerson, E. A. (1981). Design and synthesis of synchronization skeletons using branching time temporal logic. In Workshop on Logic of Programs (pp. 52–71). Springer.

  • Cotroneo, D., Natella, R., Pietrantuono, R., & Russo, S. (2014a). A survey of software aging and rejuvenation studies. ACM Journal on Emerging Technologies in Computing Systems (JETC), 10(1), 8.

    Google Scholar 

  • Cotroneo, D., Natella, R., Pietrantuono, R., & Russo, S. (2014b). A survey of software aging and rejuvenation studies. ACM Journal on Emerging Technologies in Computing Systems, 10(1), 1–34.

    Article  Google Scholar 

  • Das, A., & Das, O. (2009). Performability of layered systems–models and methods. In Software technology and engineering (pp. 189–193). World Scientific.

  • Eto, H., & Dohi, T. (2006). Analysis of a service degradation model with preventive rejuvenation. In International Service Availability Symposium (pp. 17–29). Springer.

  • Gaillard, R. (2011). Single event effects: mechanisms and classification. In Soft errors in modern electronic systems (pp. 27–54). Springer.

  • Garg, S., Puliafito, A., Telek, M., & Trivedi, K. (1998). Analysis of preventive maintenance in transactions based software systems. IEEE Transactions on Computers, 47(1), 96–107.

    Article  Google Scholar 

  • Gossett, C. A., Hughlock, B. W., Katoozi, M., LaRue, G. S., & Wender, S. A. (2002). Single event phenomena in atmospheric neutron environments. IEEE Transactions on Nuclear Science, 40(6), 1845–1852.

    Article  Google Scholar 

  • Grottke, M., & Trivedi, K. S. (2007). Fighting bugs: remove, retry, replicate, and rejuvenate. Computer, 40(2), 107–109.

    Article  Google Scholar 

  • Gunneflo, U., Karlsson, J., & Torin, J. (1989). Evaluation of error detection schemes using fault injection by heavy-ion radiation. In 1989 The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers (pp. 340–347). IEEE.

  • Heijmen, T. (2011). Soft errors from space to ground: Historical overview, empirical evidence, and future trends. In Soft errors in modern electronic systems (pp. 1–25). Springer.

  • Huang, Y., Kintala, C., Kolettis, N., & Fulton, N. D. (1995). Software rejuvenation: analysis, module and applications. In ftcs (p. 0381). IEEE.

  • Jiang, L., & Xu, G. (2007). Modeling and analysis of software aging and software failure. Journal of Systems and Software, 80(4), 590–595.

    Article  Google Scholar 

  • Kourai, K., & Chiba, S. (2011). Fast software rejuvenation of virtual machine monitors. IEEE Transactions on Dependable and Secure Computing, 8(6), 839–851.

    Article  Google Scholar 

  • Kwiatkowska, M., Norman, G., & Parker, D. (2007). Stochastic model checking. In International School on Formal Methods for the Design of Computer, Communication and Software Systems (pp. 220–270). Springer.

  • Kwiatkowska, M., Norman, G., & Parker, D. (2011). PRISM 4.0: Verification of probabilistic real-time systems. In International conference on computer aided verification (pp. 585–591). Springer.

  • Lesea, A., & Fabula, J. (2008). Continuing experiments of atmospheric neutron effects on deep submicron integrated circuits. WP286 (v1. 0), Xilinx Inc, 2.

  • Machida, F., Xiang, J., Tadano, K., & Maeno, Y. (2012). Software life-extension: a new countermeasure to software aging. In Software reliability engineering (ISSRE), 2012 IEEE 23rd International Symposium on (pp. 131–140). IEEE.

  • Machida, F., Xiang, J., Tadano, K., & Maeno, Y. (2017). Lifetime extension of software execution subject to aging. IEEE Transactions on Reliability, 66(1), 123–134.

    Article  Google Scholar 

  • Matias, R., Andrzejak, A., Machida, F., Elias, D., & Trivedi, K. (2014). A systematic differential analysis for fast and robust detection of software aging. In Reliable distributed systems (SRDS), 2014 IEEE 33rd International Symposium on (pp. 311–320). IEEE.

  • McMurtrey, D., Morgan, K. S., Pratt, B., & Wirthlin, M. J. (2008). Estimating TMR reliability on FPGAs using Markov models.

  • Nguyen, T. A., Kim, D. S., & Park, J. S. (2014). A comprehensive availability modeling and analysis of a virtualized servers system using stochastic reward nets. The Scientific World Journal, 2014, 1–18.

    Google Scholar 

  • Ning, G., Zhao, J., Lou, Y., Alonso, J., Matias, R., Trivedi, K. S., Yin, B. B., & Cai, K. Y. (2016). Optimization of two-granularity software rejuvenation policy based on the Markov regenerative process. IEEE Transactions on Reliability, 65(4), 1630–1646.

    Article  Google Scholar 

  • Okamura, H., & Dohi, T. (2011). A pomdp formulation of multistep failure model with software rejuvenation. In Software aging and rejuvenation (WoSAR), 2011 IEEE Third International Workshop on (pp. 14–19). IEEE.

  • Okamura, H., & Dohi, T. (2013). Dynamic software rejuvenation policies in a transaction-based system under Markovian arrival processes. Performance Evaluation, 70(3), 197–211.

    Article  Google Scholar 

  • Okamura, H., Luo, C., & Dohi, T. (2013). Estimating response time distribution of server application in software aging phenomenon. In Software reliability engineering workshops (ISSREW), 2013 IEEE international symposium on (pp. 281–284). IEEE.

  • Okamura, H., Yamamoto, K., & Dohi, T. (2014). Transient analysis of software rejuvenation policies in virtualized system: phase-type expansion approach. Quality Technology & Quantitative Management, 11(3), 335–351.

    Article  Google Scholar 

  • Parnas, D. L. (1994). Software aging. In Proceedings of the 16th international conference on Software engineering (pp. 279–287). IEEE Computer Society Press.

  • Pratt, B. H., Caffrey, M. P., Gibelyou, D., Graham, P. S., Morgan, K., & Wirthlin, M. J. (2008). TMR with more frequent voting for improved FPGA reliability. In ERSA (pp. 153–158).

    Google Scholar 

  • Qiao, Y., Zheng, Z., Fang, Y., Qin, F., Trivedi, K. S., & Cai, K.-Y. (2018). Two-level rejuvenation for android smartphones and its optimization. IEEE Transactions on Reliability.

  • Qin, F., Zheng, Z., Qiao, Y., & Trivedi, K. S. (2018). Studying aging-related bug prediction using cross-project models. IEEE Transactions on Reliability., 1–20.

  • Queille, J.-P., & Sifakis, J. (1982). Specification and verification of concurrent systems in CESAR. In International Symposium on programming (pp. 337–351). Springer.

  • Rahme, J., & Xu, H. (2015). A software reliability model for cloud-based software rejuvenation using dynamic fault trees. International Journal of Software Engineering and Knowledge Engineering, 25(09n10), 1491–1513.

    Article  Google Scholar 

  • Santos, H., Pimentel, J. F., Da Silva, V. T., & Murta, L. (2015). Software rejuvenation via a multi-agent approach. Journal of Systems and Software, 104, 41–59.

    Article  Google Scholar 

  • Valentim, N. A., Macedo, A., & Jr, R. M. A systematic mapping review of the first 20 years of software aging and rejuvenation research. In IEEE International Symposium on Software Reliability Engineering Workshops, 2016.

  • Von Neumann, J. (1956). Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34, 43–98.

    MathSciNet  Google Scholar 

  • Wang, Z.-M., Ding, L.-L., Yao, Z.-B., Guo, H.-X., Zhou, H., & Lv, M. (2009). The reliability and availability analysis of SEU mitigation techniques in SRAM-based FPGAs. In Radiation and its effects on components and systems (RADECS), 2009 European Conference on (pp. 497–503). IEEE.

  • Xie, W., Hong, Y., & Trivedi, K. (2005). Analysis of a two-level software rejuvenation policy. Reliability Engineering & System Safety, 87(1), 13–22.

    Article  Google Scholar 

  • Xin, W. (2010). Partitioning triple modular redundancy for single event upset mitigation in FPGA. In E-Product E-Service and E-Entertainment (ICEEE), 2010 International Conference on (pp. 1–4). IEEE.

  • Yang, M., Min, G., Yang, W., & Li, Z. (2014). Software rejuvenation in cluster computing systems with dependency between nodes. Computing, 96(6), 503–526.

    Article  Google Scholar 

  • Zhao, J., Trivedi, K. S., Grottke, M., Alonso, J., & Wang, Y. (2014). Ensuring the performance of Apache HTTP server affected by aging. IEEE Transactions on Dependable and Secure Computing, 11(2), 130–141.

    Article  Google Scholar 

Download references

Funding

We thank the National Natural Science Foundation of China (Grant Nos. 61672080) and National Aerospace Science Foundation of China (Grant Nos. 2016ZD51031) for their support. This work is also partially supported by ZFYY41402020502 and JSZL2017601B005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shunkun Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shao, Q., Gou, X., Huang, T. et al. Anti-aging analysis for software reliability design modes in the context of single-event effect. Software Qual J 28, 221–243 (2020). https://doi.org/10.1007/s11219-019-09464-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-019-09464-3

Keywords

Navigation