Abstract

The phase-type distribution (also known as PH distribution) has mathematical properties of denseness and closure in calculation and is, therefore, widely used in shock model constructions describing occurrence time of a shock or its damage. However, in the case of samples with only interval data, modeling with PH distribution will cause decoupling issues in parameter estimation. Aiming at this problem, an approximate parameter estimation method based on building PH distribution with dynamic order is proposed. Firstly, the shock model established by PH distribution and the likelihood function under samples with only interval data are briefly introduced. Then, the principle and steps of the method are introduced in detail, and the derivation processes of some related formulas are also given. Finally, the performance of the algorithm is illustrated by a case with three different types of distributions.

1. Introduction

The shock model is usually used in the study of a system’s reliability. The model was first proposed by Esary et al. [1] to describe time intervals between adjacent shocks and its damage by random distributions, as shown in Figure 1. When the cumulative damage (blue line) caused by the shock sequence (green lines) exceeds a certain threshold (red line), the system fails.

In the shock model, the occurrence time of shocks is usually expressed as a Poisson process [2, 3]; that is, the time interval of adjacent shocks satisfies the exponential distribution, and the one shock damage is also assumed to satisfy the same cluster. Of course, other forms of distributions [46], such as gamma distribution, log-normal distribution, and Weibull distribution, also have been used in constructing shock models. While these distribution clusters enhance the model’s adaptability, it also increases the difficulties in estimating their parameters.

Observed sample data are required to obtain the model’s parameters. When facing point sample data, their form of log-likelihood function is usually a sum form of its distribution with the logarithmic operator in the front, that is to say, the parameters are easy to be decoupled and methods of MLE (maximum likelihood estimation), including various types of Newton iterative methods [7], can be used for estimating parameters and obtaining a result with a high accuracy. When the sample data are interval data, the likelihood function often contains a large number of convolution operations since the data are incomplete. At this time, exponential distribution is often assumed to simplify the calculation, while it may cause the model to be inconsistent with the actual situation. Of course, some Monte Carlo methods [8] have also been proposed to solve the computational problems caused by convolution, but the estimation accuracy of the methods is usually not controllable.

Phase-type distribution (known as PH distribution) is a kind of probability distribution clusters based on continuous-time Markov process [9]. Suppose a continuous-time Markov process with states, wherein the states are transient states, the state is an absorbing state, and represents the state of the random process at moment . Then, the infinitesimal generator matrix of the Markov process can be expressed as , wherein is a -dimensional invertible matrix and is a -dimensional column vector, satisfying , where is a column vector and full elements of 1 with corresponding dimensional.

Based on the above Markov process, a -order continuous PH distribution can be defined. A random variable is distributed as an absorption time taken to transition from any initial state to the absorbing state , wherein represents a row vector of probability distribution of each initial phase, is called the phase-type generator, and represents a column vector of absorption rate in which the respective phase transfers to the absorbing state.

The cumulative distribution function (CDF) of the PH distribution is

The probability density function (PDF) of the PH distribution is

The PH distribution is dense in the positive abscissa axis; that is, it can fit any type of probability distributions with the random variable located on the interval . In addition, the PH distribution also has the feature of closure in convolution operations, which greatly simplifies the calculation. Due to the above properties, it is widely used in shock modeling. Neuts and Bhattacharjee [10] first used PH distribution in shock model construction. Ochoa et al. [11, 12] used PH distribution to characterize time intervals of adjacent shocks and one shock damage to analyze reliability of systems. These works are constantly expanding the adaptability of shock models, but it can also be seen that the PH distribution is currently mainly used in shock model constructions under the samples with point data or partial interval data.

There are three general parameter estimation methods for PH distribution [13]: maximum likelihood method, Monte Carlo method, and EM method. Among them, the maximum likelihood method, including various types of Newton iterative methods, is often used to solve some specific types of simple PH distributions; the Monte Carlo method is mainly used for specific forms of PH distribution with point data; the expectation maximization method (also known as the EM method) is the most widely used method for parameter estimation; however, it is mainly used in PH distribution with point data [14] or partial interval data [15]. If only interval data can be observed, such as only censored data and interval failure data can be obtained, the model constructed with PH distribution will face the decoupling problem when performing parameter estimation. At this time, the methods described previously cannot be used directly.

Aiming at this problem, an approximate method for parameter estimation of the shock model constructed with PH distribution is proposed. By giving an estimated error and constructing PH distribution with dynamic order, the parameters of the shock model with only interval data samples can be estimated. The paper is organized as follows: in Section 2, the shock model constructed by PH distribution and the likelihood function with only interval data samples are briefly introduced. Section 3 reconstructs the process statistics based on the framework of the traditional EM method [14], explains the approximation principle in detail, derives the approximate estimated expression of the model parameters, and also gives the total steps of the entire algorithm. Section 4 tests the performance of the algorithm through a case with three different types of distributions; Section 5 summarizes the method proposed in this paper and briefly introduces the follow-up work.

2. Shock Model and Likelihood Function

2.1. Shock Model Based on PH Distribution

Some assumptions:(1)The failure threshold of a component is , and the censored time is .(2)The shock damage caused by the th shock is an iid random variable, and the corresponding PDF is , , and the CDF is , . The cumulative damage caused by the number of shock is , let , and the corresponding CDF is , and the PDF is .(3)The occurrence time of the th shock is , let , and the corresponding CDF is , , and the PDF is , . The time interval is also an iid random variable, and the corresponding PDF is , , and the CDF is , .(4)The occurrence time of a shock is independent of its damage and also independent of the cumulative damage that has been caused by shock sequence.

Let , , represent the probability that the occurrence time of the th shock exceed for the first time. Based on the previous assumptions, the reliability at moment iswhere represents the number of shocks that the component receives at the moment and represents the left limit of the threshold.

The following describes the specific forms of the previous parameters by PH distributions:(1)Assume that the one shock damage satisfies order PH distribution , wherein represents the probability distribution vector of initial phases, which satisfies , and represents the phase-type generator. Let represent the column vector of absorption rate. Then, the CDF of is , . The corresponding PDF is , .(2)The cumulative damage caused the number of shock satisfies the PH distribution , wherein represents the distribution probability vector of initial phases and represents the phase-type generator, in which represents the transfer rate matrix at the occurrence of a shock and the angular scale, respectively, represents the number of rows and columns of a combined vector or matrix. The corresponding column vector of absorption rate is . Then, the CDF of is , . The corresponding PDF is , . Let present the probability that the cumulative damage is no less than x after the nth shock for the first time. Then, , , , where represents a column vector that the th to the th elements are 1 and others are 0, with corresponding order.

The distribution of the adjacent shock time interval and shock time is ( order) and . The assumption forms of these two distributions are similar to the shock damage distributions described previously.

2.2. Likelihood Function of Interval Data

Assume that the samples are independent of each other and the data can only be obtained at the censored time . The information of observed sample data is as follows:(1)The total number of the samples is (2)The failure mode set of the samples is , wherein represents the censored sample and represents the failure sample (3)The number of shocks received by the samples is

Let represent all parameters to be estimated, where represents the parameter set of the distribution of the one shock damage and represents the parameter set of the distribution of the time interval. Based on the assumption that the occurrence time of the shock is independent of its damage, the likelihood function can be expressed separately as shown in Table 1.

In summary, the overall likelihood function based on the sample data is

It can be seen in Table 1 that parameters in equations with the integral cannot be decoupled easily with the general methods proposed in Section 1. Combined with the assumed model above, observation data, and likelihood function proposed in this section, a specific method on estimating parameters is introduced as follows.

3. Principles, Steps, and Derivation of the Method

It is not difficult to see from Table 1 that forms of the likelihood functions are essentially the same, so only the estimating process of parameters is described in detail here, which can also be used for estimating . According to the main steps of the EM method [14], it is necessary to construct a complete data likelihood function.

3.1. Process Statistics Construction

Four variables and an indicator function are defined in the following:(1) represents the phase of the sample after the th phase transition in the th shock damage period;(2) represents the damage caused by phase ;(3) represents the number of phase transitions of the sample during the th shock damage period, including the times when the damage period changes;(4) represents the number of additional shocks that may cause failure if the sample is censored with shock number . It can be seen that is a random variable and represents its specific sample value;(5)Indicator function represents whether the phase of the sample is after the th phase transition in the th shock damage period.

With the above variables and function, four process statistics can be constructed as follows:(1)The set of statistics represents whether the initial phase of the sample is during the th shock damage period. The values of are shown in Table 2.

(2)The set of statistics represents the damage caused by the sample in phase during the th shock damage period. The values of are shown in Table 3.
(3)The set of statistics represents the total number of phase transition times of the sample when the phase is transferred from to during the th shock damage period, which does not include the times of the phase transition when the damage period changes. The values of are shown in Table 4.
(4)The set of statistics represents whether the phase of the sample is transferred from to the absorbing state during the th shock damage period. The values of are shown in Table 5.

Let represent a set of process statistics of all samples, where represents the set of process statistics of the sample .

3.2. Parameter Estimation Expression

With the above process statistics, the likelihood function under the complete data can be expressed as follows:wherein, the complete data likelihood function of the censored sample is

The complete data likelihood function of the failure sample is

A log-likelihood function can be obtained from equation (5):

According to the EM method, the posterior distribution of process statistics is

Assuming that the parameter values have been obtained by previous iteration, the expectation of posterior distribution of process statistics in equation (8) by equation (9) iswhere the expectation of the log-likelihood function with the censored sample data is

And, the expectation of the log-likelihood function with the failure sample data is

The parameters to be estimated satisfy the following constraints:(1)The initial phase probability satisfies (2)For any phase,

With the MLE method, the parameters can be expressed in the following equations:

The general steps of the EM method areStep E: solving the expectation sets of posterior distribution of the process statistics proposed in Section 3.1:Step M: bringing the result obtained in step E into equation (13) to obtain an iteration value of parameters .

It is not difficult to see that the set of process statistics constructed in Section 3.2 is infinite, and expectations of the posterior distribution of the process statistics cannot be obtained at this time, so an approximate method is introduced in the following sections.

3.3. Approximate Principle

Assume that the sample is censored after the th shock, and the probability of failure after another th shock is .

Then, the survival probability of the sample after the th shock can be expressed as

Let , then it is not difficult to see that for any determined PH distribution and failure threshold, is constantly satisfied.

That is, given any positive real number , there must exist at least one such that for any , is constantly satisfied.

Now, assume that is sufficiently small and satisfies and , then equation (15) can be approximated by a function of finite summation:where represents a column vector that the element from to is 1 and the others are 0, with corresponding order.

Based on the above description, the parameters can be approximated as

3.4. Expectations of Process Statistics with Posterior Distribution

After approximation, the expectations of process statistics with posterior distribution can be obtained referring to the method in [14, 15] by making some improvements. The results are given directly as follows:(1)Censored sample :(1)The expectations of with posterior distribution:① When , where .② When ,① When ,(2)The expectations of with posterior distribution:① When ,③ When ,(3)The expectations of with posterior distribution:① When ,② When ,(4)The expectations of with posterior distribution:① When ,② When ,(2)Failure sample :(1)The expectations of with posterior distribution:② When ,(2)The expectations of with posterior distribution:(3)The expectations of with posterior distribution:(4)The expectations of with posterior distribution:wherein , , ,and represents a column vector that the th element is 1 and others are 0, with corresponding dimensions.

② When ,
3.5. Steps of the Method

It can be seen from the previous description that it is necessary to determine the value of the parameter first. After that, the corresponding number of shocks according to the parameter values obtained by the current iteration can be determined, so that the survival probability that the number of additional shocks exceeds can be suffered for most censored samples is less than .

If the number of shocks of a censored sample satisfies , it is not difficult to see that the survival probability of such samples satisfies

Therefore, it can be assumed that if the sample suffers one more shock, then it will inevitably fail, which means . However, for the other censored samples, the original value remains unchanged.

Taking the process of parameter estimation of in damage distribution as an example, the specific steps of the method based on the shock model and interval sample data described in Section 2 are as follows:Step 1: given the initial value to the distribution and as the end condition of iteration, determine the approximate threshold  Step 2: calculate the corresponding by and  Step 3: calculate the value of the log-likelihood function according to  Step 4: use the E-step according to the distribution parameter obtained in the th iteration and the formulas in Section 3.4 to obtain expectation of the process statistic set with posterior distributionNote that during the processes, for any censored sample , ①When , does not change ②When , there is  Step 5: bring into equation (17) to obtain a new iteration value  Step 6: calculate the corresponding by and  Step 7: calculate the value of the log-likelihood function of the sample according to  Step 8: compare the value of with : ① When , let and repeat steps 4–7 ② When , end the iteration and return the estimated value

The parameter of time interval distribution can also be obtained based on the above steps. For censored samples , note that since the last shock time satisfies , while for failure samples , the cumulative damage caused by shocks at the time of failure satisfies , it is necessary to consider that the number of shocks should be when estimating , which is similar to when estimating .

4. A Case

In this section, the performance of the parameter estimation method is tested by a case with only interval data samples. The case includes three types of nonexponential distributions (gamma distribution, log-normal distribution, and Weibull distribution) which are commonly used in shock model constructions to generate sample data. The proposed method is used to estimate the parameters of the corresponding PH distributions, and the results are compared with the above three distributions to analyze its performance.

4.1. Model Assumption

(1)Assume that the sample size is , the failure threshold is , and the censored time is . Sample data can only be obtained at the censored time.(2)Assume that the end condition of iteration is and the approximate threshold is .(3)There are two shock models:(1)Model 1:① The interval of shock time satisfies gamma distribution with the shape parameter of 3 and scale parameter of 2② The one shock damage satisfies log-normal distribution with a log-mean value of 0.5 and log-variance value of 1(3)Model 2:① The interval of shock time satisfies gamma distribution with the shape parameter of 3 and scale parameter of 2② The one shock damage satisfies Weibull distribution with the shape parameter of 3 and scale parameter of 2

4.2. Parameter Estimation and Result Analysis

Based on the method, the initial value, including type and order of PH distribution, should be given first. How to select type and order of a PH distribution can be referred to [16], and each type of PH distribution is described in detail in [17]. The estimation results of parameters of the time interval distributions are only shown in the first model since the distributions in both models are the same.

4.2.1. Parameter Estimation of Time Interval Distribution

The distribution of the time interval is fitted by Erlang-3 distribution with 3-order. The initial value of the PH distribution is as follows:

The simulation time is about 5 h, and the parameter estimation results based on the simulation data are as follows:

The estimated PDF of the PH distribution and the original gamma distribution are both shown in Figure 2, and the related statistics are shown in Table 6.

It can be seen from the results shown in Figure 2 and Table 6 that the estimated PH distribution fits well with the original gamma distribution based on the interval sample data. When the shape parameter of gamma distribution is an integer, the random variable can be regarded as the sum of the same number of iid random variables from exponential distribution with the parameter equal to the scale parameter of the gamma distribution, so the initial distribution form can be easily determined as Erlang-3 distribution in this case. At this time, the parameter estimation of the PH distribution is an unbiased estimation, and the estimated errors are entirely derived from the randomness of the samples.

4.2.2. Parameter Estimation of Damage Distribution (Log-Normal Damage)

The distribution of the shock damage is fitted by Coxian distribution with 3-order. The initial value of the PH distribution is given as follows:

The simulation time is about 14 h, and the parameter estimation results based on the simulation data are as follows:

The estimated PDF of the PH distribution and the original log-normal distribution are both shown in Figure 3, and the related statistics are shown in Table 7.

It can be seen from Figure 3 that the fitting result of the estimated PH distribution is not much different from the original log-normal distribution, but it also can be seen from Table 7 that the deviation of the statistics is larger than the deviation in Table 6. This is because the parameter estimation result of the PH distribution is affected by the randomness of the sample, the initial type of the PH distribution, and the initial value of the parameter.

4.2.3. Parameter Estimation of Damage Distribution (Weibull Damage)

The distribution of the shock damage is fitted by Coxian distribution with 4-order, and the initial value of the PH distribution is given as follows:

The simulation time is about 17 h, and the parameter estimation results based on the simulation data are as follows:

The estimated PDF of the PH distribution and the original Weibull distribution are both shown in Figure 4, and the relevant statistics are shown in Table 8.

It can be found that if the initial value of the first phase in the absorption rate vector is not set to 0, the PDF of the PH distribution obtained by parameter estimation at point 0 in abscissa is not 0. The reason for this is that the EM method always obtains the iterative parameter value which can maximize the likelihood function that means the PH distribution obtained under the nonzero absorption rate of the first phase has a larger likelihood value, but the result does not match the actual situation. Therefore, the initial value of the absorption rate of the first phase should be set to 0.

It can be seen from the case that under the interval data, the PH distributions based on the estimation results can fit well with the original distributions mentioned above. Therefore, through the approximate method proposed in this paper, the PH distribution can be used for shock model construction under only interval sample data, which not only improves the integral problem cause by the original distributions, but also can obtain the distribution form with a higher degree of fitting.

During the process of the above case, it is found that, in the early stage of the iteration, the value of the log-likelihood functionwhich is calculated with the relative parameter iterative value obtained by the interval failure data and obtained by the censored sample data, is very small; that is, the difference between the two partial log-likelihood functions through two adjacent iterative loops does not affect the end condition of the iteration. However, it can be seen from equation (17) that performing parameter estimation with these two types of samples needs to calculate a large number of posterior distribution expectation of process statistics. In order to improve the efficiency of the proposed method, some improvements are given here with the estimation of parameters as an example.

When performing the estimation of parameters , a threshold is set for the censored samples which satisfies .

When , the censored sample data are considered in the parameter estimation.

When , the parameters are estimated only by the interval failure data.

The estimation of parameters in distribution is similar when dealing with interval failure samples. This improvement is only applicable to the case where the sample set contains both censored samples and interval failure samples .

5. Conclusion

In this paper, an approximate parameter estimation method based on constructing PH distribution with dynamic orders is proposed to solve the decoupling problems in parameter estimation of the shock model when facing the samples with only interval data. Its performance is tested through a simulation case from which can be seen that the method proposed in this paper can obtain well-fitted parameter estimation results using only interval sample data.

There are also limitations in the application of this method. For example, it is only applicable to the shock problem under a single stress source which means the shock time interval and the shock damage need to be subjected to the condition of iid. If there exists multistress sources which means the shock interval and shock damage may satisfy different types of distributions, the methods presented in this paper will face some identifiability problems. Later work will focus on solving these.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was conducted as a part of a project supported by the National Natural Science Foundation of China (Grant no. 61903370).