Abstract
Software aging is a risk associated with the continuous operation of software, and it is essential and meaningful to develop anti-aging technology to offset or mitigate the aging phenomenon. While considerable attention has been devoted to software aging and anti-aging techniques, few studies have focused on single-event effect as a software-aging reason in the context of a space environment. In this study, aiming at the software-aging problem caused by the specific reason above, besides the classic software rejuvenation, we further explore the anti-aging effects and rules of software reliability design modes, including triple modular redundancy (TMR) and logical partitioning. Reliability and availability are used as aging indicators, and the anti-aging effect of reliability design modes and rejuvenation policy is quantitatively analyzed through probabilistic model checking. The simulation and theoretical results show that the reliability design mode can alleviate software aging. However, the TMR mode is time-sensitive. It is found that the application of the rejuvenation policy makes time-sensitivity disappear. A combination of reliability design modes and rejuvenation policy can obtain the best anti-aging effect. The analysis and discussion in this paper can provide useful insights for software researchers to instantiate different software anti-aging inventions or new applications.
Similar content being viewed by others
Abbreviations
- CTMC:
-
Continuous-time Markov chain
- TMR :
-
Triple modular redundancy
- SEE :
-
Single-event effect
- λ :
-
SEE intensity
- n :
-
Number of partitions
- SI :
-
Scrub interval
- T :
-
Observation time
- succ :
-
State succ indicates that the software runs normally and the system performance is optimal
- aging :
-
State aging means software aging and system performance degradation
- down :
-
State down means software failure
- None :
-
The None mode described in this paper refers to a software system that does not apply any reliability design mode
- TMR :
-
The TMR mode refers to the software system using TMR technology
- TMR_Partition :
-
The TMR_Partition mode refers to a software system that applies TMR and logical partition technology simultaneously
- None_Rejuvenation :
-
The None_Rejuvenation mode refers to the software system that uses rejuvenation technology alone
- TMR_Rejuvenation :
-
The TMR_Rejuvenation mode refers to the software system that uses TMR and rejuvenation technology
- TMR_Rejuvenation_Partition :
-
The TMR_Rejuvenation_Partition mode refers to the software system using three technologies simultaneously
References
Adell, P., & Allen, G. (2008). Assessing and mitigating radiation effects in Xilinx FPGAs. Pasadena, CA: Jet Propulsion Laboratory, California Institute of Technology.
Andrzejak, A., & Silva, L. (2007). Deterministic models of software aging and optimal rejuvenation schedules. In Integrated Network Management, 2007. IM’07. 10th IFIP/IEEE International Symposium on (pp. 159–168). IEEE.
Avritzer, A., Cole, R. G., & Weyuker, E. J. (2010). Methods and opportunities for rejuvenation in aging distributed software systems. Journal of Systems and Software, 83(9), 1568–1578.
Bolchini, C., & Sandionigi, C. (2010). Fault classification for SRAM-based FPGAs in the space environment for fault mitigation. IEEE Embedded Systems Letters, 2(4), 107–110.
Cao, J., & Cheng, K. (2006). Introduction to reliability mathematics (pp. 182–228). Beijing: Higher Education Press.
Clarke, E. M., & Emerson, E. A. (1981). Design and synthesis of synchronization skeletons using branching time temporal logic. In Workshop on Logic of Programs (pp. 52–71). Springer.
Cotroneo, D., Natella, R., Pietrantuono, R., & Russo, S. (2014a). A survey of software aging and rejuvenation studies. ACM Journal on Emerging Technologies in Computing Systems (JETC), 10(1), 8.
Cotroneo, D., Natella, R., Pietrantuono, R., & Russo, S. (2014b). A survey of software aging and rejuvenation studies. ACM Journal on Emerging Technologies in Computing Systems, 10(1), 1–34.
Das, A., & Das, O. (2009). Performability of layered systems–models and methods. In Software technology and engineering (pp. 189–193). World Scientific.
Eto, H., & Dohi, T. (2006). Analysis of a service degradation model with preventive rejuvenation. In International Service Availability Symposium (pp. 17–29). Springer.
Gaillard, R. (2011). Single event effects: mechanisms and classification. In Soft errors in modern electronic systems (pp. 27–54). Springer.
Garg, S., Puliafito, A., Telek, M., & Trivedi, K. (1998). Analysis of preventive maintenance in transactions based software systems. IEEE Transactions on Computers, 47(1), 96–107.
Gossett, C. A., Hughlock, B. W., Katoozi, M., LaRue, G. S., & Wender, S. A. (2002). Single event phenomena in atmospheric neutron environments. IEEE Transactions on Nuclear Science, 40(6), 1845–1852.
Grottke, M., & Trivedi, K. S. (2007). Fighting bugs: remove, retry, replicate, and rejuvenate. Computer, 40(2), 107–109.
Gunneflo, U., Karlsson, J., & Torin, J. (1989). Evaluation of error detection schemes using fault injection by heavy-ion radiation. In 1989 The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers (pp. 340–347). IEEE.
Heijmen, T. (2011). Soft errors from space to ground: Historical overview, empirical evidence, and future trends. In Soft errors in modern electronic systems (pp. 1–25). Springer.
Huang, Y., Kintala, C., Kolettis, N., & Fulton, N. D. (1995). Software rejuvenation: analysis, module and applications. In ftcs (p. 0381). IEEE.
Jiang, L., & Xu, G. (2007). Modeling and analysis of software aging and software failure. Journal of Systems and Software, 80(4), 590–595.
Kourai, K., & Chiba, S. (2011). Fast software rejuvenation of virtual machine monitors. IEEE Transactions on Dependable and Secure Computing, 8(6), 839–851.
Kwiatkowska, M., Norman, G., & Parker, D. (2007). Stochastic model checking. In International School on Formal Methods for the Design of Computer, Communication and Software Systems (pp. 220–270). Springer.
Kwiatkowska, M., Norman, G., & Parker, D. (2011). PRISM 4.0: Verification of probabilistic real-time systems. In International conference on computer aided verification (pp. 585–591). Springer.
Lesea, A., & Fabula, J. (2008). Continuing experiments of atmospheric neutron effects on deep submicron integrated circuits. WP286 (v1. 0), Xilinx Inc, 2.
Machida, F., Xiang, J., Tadano, K., & Maeno, Y. (2012). Software life-extension: a new countermeasure to software aging. In Software reliability engineering (ISSRE), 2012 IEEE 23rd International Symposium on (pp. 131–140). IEEE.
Machida, F., Xiang, J., Tadano, K., & Maeno, Y. (2017). Lifetime extension of software execution subject to aging. IEEE Transactions on Reliability, 66(1), 123–134.
Matias, R., Andrzejak, A., Machida, F., Elias, D., & Trivedi, K. (2014). A systematic differential analysis for fast and robust detection of software aging. In Reliable distributed systems (SRDS), 2014 IEEE 33rd International Symposium on (pp. 311–320). IEEE.
McMurtrey, D., Morgan, K. S., Pratt, B., & Wirthlin, M. J. (2008). Estimating TMR reliability on FPGAs using Markov models.
Nguyen, T. A., Kim, D. S., & Park, J. S. (2014). A comprehensive availability modeling and analysis of a virtualized servers system using stochastic reward nets. The Scientific World Journal, 2014, 1–18.
Ning, G., Zhao, J., Lou, Y., Alonso, J., Matias, R., Trivedi, K. S., Yin, B. B., & Cai, K. Y. (2016). Optimization of two-granularity software rejuvenation policy based on the Markov regenerative process. IEEE Transactions on Reliability, 65(4), 1630–1646.
Okamura, H., & Dohi, T. (2011). A pomdp formulation of multistep failure model with software rejuvenation. In Software aging and rejuvenation (WoSAR), 2011 IEEE Third International Workshop on (pp. 14–19). IEEE.
Okamura, H., & Dohi, T. (2013). Dynamic software rejuvenation policies in a transaction-based system under Markovian arrival processes. Performance Evaluation, 70(3), 197–211.
Okamura, H., Luo, C., & Dohi, T. (2013). Estimating response time distribution of server application in software aging phenomenon. In Software reliability engineering workshops (ISSREW), 2013 IEEE international symposium on (pp. 281–284). IEEE.
Okamura, H., Yamamoto, K., & Dohi, T. (2014). Transient analysis of software rejuvenation policies in virtualized system: phase-type expansion approach. Quality Technology & Quantitative Management, 11(3), 335–351.
Parnas, D. L. (1994). Software aging. In Proceedings of the 16th international conference on Software engineering (pp. 279–287). IEEE Computer Society Press.
Pratt, B. H., Caffrey, M. P., Gibelyou, D., Graham, P. S., Morgan, K., & Wirthlin, M. J. (2008). TMR with more frequent voting for improved FPGA reliability. In ERSA (pp. 153–158).
Qiao, Y., Zheng, Z., Fang, Y., Qin, F., Trivedi, K. S., & Cai, K.-Y. (2018). Two-level rejuvenation for android smartphones and its optimization. IEEE Transactions on Reliability.
Qin, F., Zheng, Z., Qiao, Y., & Trivedi, K. S. (2018). Studying aging-related bug prediction using cross-project models. IEEE Transactions on Reliability., 1–20.
Queille, J.-P., & Sifakis, J. (1982). Specification and verification of concurrent systems in CESAR. In International Symposium on programming (pp. 337–351). Springer.
Rahme, J., & Xu, H. (2015). A software reliability model for cloud-based software rejuvenation using dynamic fault trees. International Journal of Software Engineering and Knowledge Engineering, 25(09n10), 1491–1513.
Santos, H., Pimentel, J. F., Da Silva, V. T., & Murta, L. (2015). Software rejuvenation via a multi-agent approach. Journal of Systems and Software, 104, 41–59.
Valentim, N. A., Macedo, A., & Jr, R. M. A systematic mapping review of the first 20 years of software aging and rejuvenation research. In IEEE International Symposium on Software Reliability Engineering Workshops, 2016.
Von Neumann, J. (1956). Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34, 43–98.
Wang, Z.-M., Ding, L.-L., Yao, Z.-B., Guo, H.-X., Zhou, H., & Lv, M. (2009). The reliability and availability analysis of SEU mitigation techniques in SRAM-based FPGAs. In Radiation and its effects on components and systems (RADECS), 2009 European Conference on (pp. 497–503). IEEE.
Xie, W., Hong, Y., & Trivedi, K. (2005). Analysis of a two-level software rejuvenation policy. Reliability Engineering & System Safety, 87(1), 13–22.
Xin, W. (2010). Partitioning triple modular redundancy for single event upset mitigation in FPGA. In E-Product E-Service and E-Entertainment (ICEEE), 2010 International Conference on (pp. 1–4). IEEE.
Yang, M., Min, G., Yang, W., & Li, Z. (2014). Software rejuvenation in cluster computing systems with dependency between nodes. Computing, 96(6), 503–526.
Zhao, J., Trivedi, K. S., Grottke, M., Alonso, J., & Wang, Y. (2014). Ensuring the performance of Apache HTTP server affected by aging. IEEE Transactions on Dependable and Secure Computing, 11(2), 130–141.
Funding
We thank the National Natural Science Foundation of China (Grant Nos. 61672080) and National Aerospace Science Foundation of China (Grant Nos. 2016ZD51031) for their support. This work is also partially supported by ZFYY41402020502 and JSZL2017601B005.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shao, Q., Gou, X., Huang, T. et al. Anti-aging analysis for software reliability design modes in the context of single-event effect. Software Qual J 28, 221–243 (2020). https://doi.org/10.1007/s11219-019-09464-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-019-09464-3