Reliability and availability prediction of embedded systems based on environment modeling and simulation

https://doi.org/10.1016/j.simpat.2020.102246Get rights and content

Abstract

The embedded system developers often need to perform software testing even when the target hardware is inaccessible. Due to the inaccessibility of the hardware, the hardware-software interaction testing becomes a challenging task. Researchers have tried to overcome this issue by developing an environment model that simulates the behavior of the actual hardware. In the existing literature, environment modeling has been used for embedded software testing and rarely used for reliability and availability prediction. In this study, we make an attempt to use the environment model for the reliability and availability prediction when actual hardware is inaccessible. At first, we model the hardware environment simulator referring to the System Requirements Specification document. Then, we use the environment simulator for simulating different operational scenarios of the hardware. Based on the operational scenarios, random test cases are generated for testing the embedded software. Finally, we predict the reliability and availability of the system using the test results. Four important aspects covered in this prediction approach are: 1) developing a method for system reliability and availability prediction using environment modeling and simulation technique, 2) considering software-related hardware and hardware-related software interaction failures, 3) applying the proposed reliability and availability model to a case study, and 4) validating the method by comparing the results with an existing approach.

Introduction

Embedded systems are often used in day-to-day activities like home appliances, communication, transportation, robotics, defense, space, etc. It is reported that over ninety-eight percent of existing electronic devices are equipped with embedded controllers [1]. Consequently, an embedded system may not only fail due to hardware-specific or software-specific failures; hardware-software interaction failures may also cause system malfunctioning. For example, Iyer and Velardi [2] at Stanford University demonstrated the impact of hardware-related software interaction failures on performance of an Multiple Virtual Storage/ System Extensions (MVS/SP) operating system. They found that 35 percent of the software malfunctioning cases are attributable to hardware failures. On the other hand, bugs in the software may also lead a hardware device to failure [3]. For example, an investigation report suggested that a software glitch in Boeing 737 Max flight led to the infamous accident of October 2018 that took 189 people's life onboard. It is now getting widely accepted that some hardware-software (HW-SW) faults are inter-dependent [4,5]. Therefore, system reliability and availability evaluation approaches need to consider HW-SW interactions testing along with hardware-specific and software-specific failure. However, in the early development phases of real-time embedded software, team members may not have access to the actual hardware components.

Some previous studies have tried to address the above issue by applying Software-in-the-Loop Simulation (SILS) approach for HW-SW interactions testing [6,7]. In this approach, the software unit is executed in a simulated hardware environment. Later, Iqbal et al. [1] extended this technique for Real-Time Embedded Systems (RTES) testing. They used UML-MARTE profile for the modeling hardware environment. They also demonstrated use of SILS approach using some industrial applications. However, this line of work has been used only for testing of real-time embedded software and has hardly been explored for reliability analysis.

As reported in our previous studies, two types of interaction failures play a significant role in system reliability analysis [8,9]. These are hardware-related software and software-related hardware interaction failures. Some researchers have considered only hardware-related software interaction failures in developing their reliability and availability models [5,[10], [11], [12]]. They have ignored the possibility of software-related hardware interaction failures [5,[10], [11], [12]]. Another group of researchers has considered both, hardware-related software and software-related hardware interaction failures [13], [14], [15], [16]. However, they targeted only - qualitative reliability analysis. We are interested in developing quantitative reliability and availability prediction method using the concept of environment modeling and simulation. The scope of our work is limited to the reliability and availability analysis of real-time embedded systems when actual hardware components are inaccessible.

The proposed method performs testing of an executable embedded software in a simulated hardware environment and predicts reliability and availability based on the test results. At first, we develop the domain model of the system using UML Class diagram (with MARTE extensions for modeling real-time scenarios). Each system component (hardware and software) is represented by a Class in the Class diagram. The system components and their dependencies are identified by referring to the System Requirements and Specifications (SysRS) document. We also identify the hardware components' operation mode/ states by referring to a standard reliability handbook or reliability data-sources. In this phase, the software controller is considered as an abstract component. We replace it with actual code at the time of code generation. Subsequently, we develop the hardware component behavior model using the UML State-machine diagram. For each hardware component, a state-machine diagram is constructed. The behavior model defines a component's characteristics at each state (normal, degraded, and failure). It also models the state transition of the hardware component for different operation conditions. Then, we generate an environment simulator by converting the domain model and hardware behavior models. The environment simulator acts as a driver program that invokes, provides hardware operational scenarios, and executes the software controller during testing. In this regard, we have adopted the work presented by Iqbal et al. [1] to perform software-in-the-loop simulation (SILS). SILS approach performs testing of the software controller for some simulated test cases. Finally, reliability and availability are predicted based on the test results. Reliability is evaluated as the ratio of the total number of systems that have survived till a given time to the total number of initial systems. On the other hand, availability is evaluated as the ratio of the total number of times system is in the working state for a given time to the total number of simulation operations. Fig. 1 provides a schematic representation of our method.

The proposed method predicts the reliability and availability of the RTES, before the embedded software is integrated with the target hardware. It performs testing of an embedded software using hardware environment modeling and predicts reliability and availability based on the test results. The existing literature has reported early testing of embedded system using environment modeling [1,[17], [18], [19], [20], [21], [22]]. However, they have not shown any interest in predicting the reliability and availability of the system. We have extended the environment modeling based testing approach for reliability and availability prediction. Our method gives a quantitative reliability and availability prediction considering all possible types of interaction failures (hardware-related software and software-related hardware), along with hardware-specific, software-specific failures. Most of the existing reliability and availability prediction approaches have not considered all possible types of hardware-software interaction failures [5,[10], [11], [12]] or have not targeted quantitative prediction [13], [14], [15], [16]. The proposed method can also reduce the production cost significantly. For example, if the RTES does not meet the target reliability at the time of the post-integration testing due to fault in system components, it incurs considerable costs. In this paper, we have applied the proposed method to an Aircraft Fuel Control System as a case study for demonstrating the transient reliability and steady-state availability evaluation process. The obtained results (reliability and availability) for the case study are compared with an existing method [23] for the validation of the proposed method.

Previous studies have reported early testing of safety-critical embedded software when the hardware is not accessible [1,[17], [18], [19], [20], [21], [22]]. The hardware often becomes inaccessible because different teams parallelly develop hardware and software parts. Their physical location also may not be the same [1]. Researchers have overcome this issue by developing an environment model that simulates behavior of the actual hardware [1,[17], [18], [19], [20], [21], [22]]. They have introduced the Software-in-the-Loop Simulation (SILS) approach, where environment modeling of the hardware is used to test the embedded software [1,17]. Iqbal et al. [1] has explained their experience of working with two organizations (WesternGeco AS, Norway and Tomra AS, Norway), where they have performed early testing of the embedded software using SILS due to the inaccessibility of actual hardware. They developed embedded software for a seismic acquisition system while working with WesternGeco. Later, another embedded software was developed by them for automated recycle machines while working with Tomra. In both cases, they tested the embedded software using SILS, as the organizations made it mandatory. The organizations instructed them to test the embedded software in a simulated environment before deploying it on the actual hardware. Jeong [18] also performed early testing of safety-critical AUTOSAR (AUTomotive Open System ARchitecture) software used for the Electronic Control Unit (ECU) in the automotive industry. In this approach, they relied on the SILS approach for AUTOSAR software testing using environment modeling. They have mentioned that early testing reduces the risk of severe damage to the system-to-be-controlled by the embedded controller. Early testing may also reduce the cost of iterative development if the software's residual errors are identified earlier. Other published work [17,[19], [20], [21], [22]] also used SILS based embedded software testing for the similar reasons as mentioned above. However, they have used SILS approach for embedded software testing only and rarely used it for reliability and availability prediction. We have extended this line of work to predict the reliability and availability of the embedded system. In the proposed method, we have performed SILS based testing of the embedded software with environment modeling for the hardware. Subsequently, reliability and availability of the system is predicted using the test results.

We have explained some basic concepts that are the foundation of the proposed reliability and availability prediction method. These concepts are frequently used in the entire paper. The explanation we have given are general enough to cover the entire range of embedded systems.

Environment modeling. A newly developed system is often tested in the development platform. Developers use the environment modeling approach to provide a virtual operational environment to the system for execution. The environment modeling approach creates a simulator that invokes the system and provides different aspects of the operational environment during testing. In this paper, we have used environment modeling of the hardware to test the software controller in the simulated hardware environment when actual hardware is inaccessible. The hardware environment model generates an Environment Simulator (ES) program that performs four tasks without any human intervention: i) generates random test cases to execute the software controller, ii) invoke the software controller, iii) provides an operational environment of the hardware to execute the software controller, and iv) record the response of the software controller for the test cases.

Embedded software testing using software-in-the-loop simulation (SILS). During early testing of the embedded software SILS approach is often used. In this approach, the tester creates a driver module and a stub module to perform SILS. The driver module invokes the embedded software and provides an operational environment of the Electronic Controller Unit (ECU). The stub module provides operational environment of the system-to-be-controlled by the embedded software and generates feedback signals for the driver module. The embedded software remains connected with the driver module and stub module in a loop during testing. The SILS is commonly used for the validation of the embedded software.

Reliability and availability prediction using environment modeling. In this approach, ES generates multiple instances (say N) of identical independent systems. ES simulates random input data set during each simulation iteration to execute the software controller and record the response. We consider each simulation iteration as an operation cycle of the system. The operation cycle should be large enough (say L) to cover the entire input domain. During the simulation process, the operational state of the software controller may changes. If the software controller's response complies with the specified normal range of the SysRS document, it is considered as a working state. If it exceeds the normal range, it is considered as a failure state. Based on the outcomes of the simulation process, reliability and availability are predicted. Reliability is defined as the ratio of the total number of systems survived (working state) till a given time to the total number of initial systems (N). During reliability prediction, we consider recovery from a fault-tolerant state is possible. However, recovery from the absorbing state is not possible. During availability prediction, we also consider all instances (N) of the system are executed for L operation cycles. Availability is defined as the ratio of the total number of times the system is in the working state for a given time to the total number of operations (L). Here, we consider recovery is possible from the fault-tolerant states as well as the absorbing states.

This paper is organized as follows: Section 2 reports a review of the existing work. Section 3 proposes environment modeling and based on it, develops simulation-based system reliability and availability prediction method. Section 4 demonstrates the application of the proposed method to a case study. Section 5 validates the proposed method. Finally, Section 6 concludes this paper.

Section snippets

Literature review

We discuss the related literature in two sections: i) environment modeling and environment model-based testing of RTES, and ii) reliability and availability prediction of embedded systems duly considering hardware-software interactions.

Proposed reliability and availability prediction method

In our proposed method, we perform testing of a software controller in a simulated hardware environment and predict system reliability and availability based on test results. The applicability of the proposed method is limited to real-time embedded systems. At first, we specify the assumptions that have been considered to develop the method. Then, we propose the method for reliability and availability prediction of the embedded system. Finally, we apply the method to an aircraft fuel control

Case study

We have applied our proposed method to an aircraft fuel control system as a case study. The working principle of the aircraft fuel control system is available at the Mathworks website [49]. As depicted in Mathworks website, an aircraft fuel control system comprises an engine, fuel tank, pump, fuel rate controller (microcontroller), actuator, and four sensors. The sensors are throttle sensor, engine fan speed sensor, exhaust gas oxygen (EGO) sensor, and manifold absolute pressure (MAP) sensor.

Validation of the proposed method

We have selected a set Markovian model [5,11,23], a set of FFIP based models [13], [14], [15], [16], and set of SILS based models [1,24,28] that perform quantitative reliability prediction or qualitative reliability analysis or availability prediction or testing of the embedded system. We compare the above models with the proposed method based on their reliability concern (hardware-specific failure, software-specific failure, hardware-related software interaction failure, and software-related

Conclusions

An RTES development team may not have access to the actual hardware components during early testing of the system. It becomes a challenging task for the developers to test the hardware-software interaction failures and turns into a bottleneck in system reliability and availability prediction. The proposed method enables the RTES development team to evaluate the system's reliability and availability before the integration of the hardware-software parts. It replaces the actual hardware components

Acknowledgments

We thank the Indian Institute of Technology (IIT) Kharagpur, India, for funding this research. We gratefully acknowledge the support of all the research scholars, faculty members, and staffs of the Subir Chowdhury School of Quality and Reliability, IIT Kharagpur, India. We also thank the anonymous reviewers for their constructive suggestions that helped us to enrich the manuscript significantly.

References (62)

  • S Sinha et al.

    Survey of combined hardware–software reliability prediction approaches from architectural and system failure viewpoint

    Int. J. Syst. Assurance Eng. Manage.

    (2019)
  • A Costes et al.

    Reliability and availability models for maintained systems featuring hardware failures and design faults

    IEEE Trans. Comput.

    (1978)
  • DS Roy et al.

    Reliability analysis of phasor measurement unit incorporating hardware and software interaction failures

    IET Gen. Transmission Distrib.

    (2014)
  • U Sumita et al.

    Analysis of software availability/reliability under the influence of hardware failures

    IEEE Trans. Softw. Eng.

    (1986)
  • X Diao et al.

    Fault propagation and effects analysis for designing an online monitoring system for the secondary loop of the nuclear power plant portion of a hybrid energy system

    Nucl. Technol.

    (2018)
  • DC Jensen et al.

    Modeling the propagation of failures in software driven hardware systems to enable risk-informed design

    ASME Int. Mech. Eng. Congress Exposition

    (2008)
  • I Tumer et al.

    Integrated design-stage failure analysis of software-driven hardware systems

    IEEE Trans. Comput.

    (2010)
  • MZ Iqbal et al.

    Environment modeling with UML/MARTE to support black-box system testing for real-time embedded systems: methodology and industrial case studies

  • S Jeong et al.

    Software-in-the-loop simulation for early-stage testing of autosar software component

  • S Jeong et al.

    An automated testing method for AUTOSAR software components based on SiL simulation

  • S Montenegro et al.

    Simulation-based testing of embedded software in space applications

    Embedded Systems–Modeling, Technology, and Applications

    (2006)
  • H Shokry et al.

    Model-based Verification of Embedded Software

    (2009)
  • S Werner et al.

    Software-in-the-loop simulation of embedded control applications based on virtual platforms

  • R Zeng et al.

    Dependability analysis of control center networks in smart grid using stochastic petri nets

    IEEE Trans. Parallel Distrib. Syst.

    (2012)
  • A David et al.

    Timed testing under partial observability

  • F Deng et al.

    Design of high confidence embedded software hardware-in-loop simulation test platform based on hierarchical model

  • A Hessel et al.

    Testing Real-Time Systems Using UPPAAL. Formal Methods and Testing

    (2008)
  • M Krichen et al.

    Conformance testing for real-time systems

    Formal Methods Syst. Des.

    (2009)
  • KG Larsen et al.

    Online testing of real-time systems using uppaal

    International Workshop on Formal Approaches to Software Testing

    (2004)
  • KG Larsen et al.

    Testing real-time embedded software using UPPAAL-TRON: an industrial case study

  • F Lindlar et al.

    Integrating model-based testing with evolutionary functional testing

  • Cited by (6)

    View full text