1 Introduction

Doubly-truncated data arise in many fields, including economics, engineering, medicine, and astronomy. When studying the lifetime distribution of a technological unit, it may be too time-consuming or operationally not feasible to obtain simple failure data. In this case, it is more convenient to collect field-failure data (Lawless 2003; Ye and Tang 2016), where any unit is retrospectively collected if it fails within a pre-specified study period. This sampling mechanism yields two types of missing units: left-truncated units that fail before study initiation and right-truncated units that fail after the study end. This sampling design is common in astronomy (Efron and Petrosian 1999), economics (Dörre 2020; Frank and Dörre 2017), medicine (Scheike and Keiding 2006; Moreira and de Uña-Álvarez 2010), and other fields. Analyses of doubly-truncated data require appropriate techniques to account for the sampling design.

Figure 1 illustrates the sampling mechanism. Let B be the birth date of a unit and T be its lifetime. Hence, \(F=B+T\) is the date of failure. Suppose that data collection begins at the date \(\tau _s\) and ends at the date \(\tau _e\). We only collect units whose failures occur in the interval \([\tau _s, \tau _e]\). Therefore, the sampling criterion is \(\tau _s - B \le T \le \tau _e - B\). In Fig. 1, two units are collected and three units are not observed out of a hypothetical population of five units. This yields a biased sample, where the units with too short or too long lifetime are less likely to be sampled. Data collected in this manner are called doubly-truncated, because the lifetime of interest T is observed if it falls between the left-truncation limit \(\tau _s - B\) and right-truncation limit \(\tau _e - B\).

Fig. 1
figure 1

An example of doubly-truncated data. A full circle denotes the failure date of collected units, and an empty circle denotes the failure date of uncollected units

In the framework of random double-truncation, the sampling criterion is \(U \le Y \le V\), where U, Y, and V are random variables. The example of Fig. 1 gives \(U = \log (\tau _s - B)\), \(Y = \log (T)\), and \(V = \log (\tau _e - B)\), where the logarithms avoid the constraint \(T \ge 0\). We call U the left-truncation limit, and V the right-truncation limit. We observe the triplet (UYV) if \(U \le Y \le V\). Hence, doubly-truncated data consist of \((u_i, y_i, v_i)\), where \(u_i \le y_i \le v_i\) for \(i = 1, 2, \ldots , n\). In the analysis of doubly-truncated data, the goal is to estimate the distributional parameters of \(T = \exp (Y)\) such as the cumulative distribution function (cdf) \(\mathrm {P}(T \le t)\). In this article, we assume independence between truncation limits and the lifetime of interest, i.e. between (UV) and Y. This is a usual assumption in the literature for treating doubly-truncated data. However, this assumption can be violated in some contexts (see Shen 2011 for a corresponding test of independence).

Nonparametric methods based on doubly-truncated data are well-established. The nonparametric maximum likelihood estimator (NPMLE) for the cdf was proposed by Efron and Petrosian (1999). Shen (2010) and Moreira and de Uña-Álvarez (2010) investigated the asymptotic properties of the NPMLE. Emura et al. (2015) developed an alternative asymptotic theory, and developed the confidence band for the cdf. A kernel estimation for the density function was considered; see Moreira and Van Keilegom (2013) and references therein. A semi-parametric approach under a parametric truncation distribution was also studied; see Moreira, de Uña-Álvarez and Van Keilegom (2014) and references therein. However, parametric methods for doubly-truncated data are relatively underdeveloped. Efron and Petrosian (1999), Hu and Emura (2015), and Emura et al. (2017) considered maximum likelihood estimation (MLE) under the assumption that \(Y = \log (T)\) follows the special exponential family. Dörre (2020) considered a Bayesian parametric estimation under a general exponential family. These exponential families contain the normal, skew-normal, exponential, and gamma distributions, but still do not contain many useful distributions such as the Weibull and log-logistic distributions. In addition, none of the previous works discuss the technique of constructing the confidence band, which is technically challenging in parametric models.

In this article, we consider the log-location-scale model for lifetime distributions, which contains the Weibull, lognormal, and log-logistic distributions as special cases. We also consider accelerated failure time (AFT) models that are the regression extension of the log-location-scale model. These parametric models are clearly useful, but the methodologies to fit these models under double-truncation are missing. We develop algorithms for fitting these models, and propose several types of interval estimation schemes. We develop the Wald-type confidence interval and transformed confidence interval for the cdf. We also derive a closed-form expression for the confidence band for the cdf, which extends the results of Cheng and Iles (1983) and Jeng and Meeker (2001) to doubly-truncated data.

This article is organized as follows. Section 2 introduces the model. In Sects. 3, 4, estimation procedures are developed. Sections 5, 6 provide numerical studies. Section 7 concludes the article.

2 The location-scale and AFT models

Let T be a non-negative random variable, and \(Y = \log (T)\) be its log-transform. Then, the location-scale model for Y is defined as the probability density function (pdf) \(f_{\varvec{\theta }}(y) = \phi \{(y-\mu )/\sigma \}/\sigma \) and cdf \(F_{\varvec{\theta }}(y) = \varPhi \{(y-\mu )/\sigma \}\), where \(\varvec{\theta }= (\mu , \sigma )^{\mathrm {T}}\) are the unknown location and scale parameters, \(\varPhi (z)\) is the standardized cdf, and \(\phi (z) = d \varPhi (z)/dz\) is its pdf.

The location-scale model for Y gives the log-location-scale model for T with the cdf and pdf

$$\begin{aligned} F_{\varvec{\theta }}(t)&= \varPhi \{(\log (t)-\mu )/\sigma \}, \quad t> 0, \\ f_{\varvec{\theta }}(t)&= \phi \{(\log (t)-\mu )/\sigma \}/(t \sigma ), \quad t > 0, \end{aligned}$$

where \( -\infty< \mu < \infty \) and \(\sigma > 0\). The log-location-scale model for T induces:

  • Weibull distribution: \(\varPhi (w) = 1 - \exp \{-\exp (w)\}\),

  • Lognormal distribution: \(\varPhi (w) = \int _{-\infty }^w \frac{1}{\sqrt{2 \pi }} \exp \left\{ - 0.5 x^2 \right\} dx\),

  • Log-logistic distribution: \(\varPhi (w) = 1/\{1 + \exp (-w)\}\),

where \(-\infty<w<\infty \). By the transformations \(\mu = (-\log \lambda )/\alpha \) and \(\sigma = 1/\alpha \), the Weibull distribution yields the scale parameter \(\lambda > 0\) and the shape parameter \(\alpha > 0\). The cdf and pdf are

$$\begin{aligned} F_{\lambda , \alpha }(t)&= \mathrm {P}(T \le t) = 1 - \exp (-\lambda t^{\alpha }), \quad t> 0, \\ f_{\lambda , \alpha }(t)&= \frac{d}{dt} F_{\lambda , \alpha }(t) = \lambda \alpha t^{\alpha - 1} \exp (-\lambda t^{\alpha }), \quad t > 0. \end{aligned}$$

The Weibull distribution is denoted as \(T \sim \mathrm {Weibull}(\lambda , \alpha )\).

The accelerated failure time (AFT) model extends the location-scale model by including covariates that accelerate or decelerate the life course. The AFT model can be expressed as

$$\begin{aligned} Y = \log (T) = \beta _0 + \beta _1 x_1 + \beta _2 x_2 + \cdots + \beta _k x_k + \sigma W = \beta _0 + {\varvec{x}}^{\mathrm {T}} \varvec{\beta }+ \sigma W, \end{aligned}$$

where \(\beta _0\) is an intercept and \(\varvec{\beta }= (\beta _1, \beta _2, \ldots , \beta _k)^{\mathrm {T}}\) are coefficients for designed covariates \({\varvec{x}}= (x_1, x_2, \ldots , x_k)^{\mathrm {T}}\). The error term W is distributed as \(\mathrm {P}(W \le w) = \varPhi (w)\), as in the location-scale model, and assumed to be independent of the covariates. The cdf and pdf of T given \({\varvec{x}}\) are written as

$$\begin{aligned} F_{\varvec{\theta }}(t | {\varvec{x}}) = \varPhi \left\{ \frac{\log (t) - \beta _0 - {\varvec{x}}^{\mathrm T} \varvec{\beta }}{\sigma } \right\} , \quad f_{\varvec{\theta }}(t | {\varvec{x}}) = \frac{1}{\sigma t} \phi \left\{ \frac{\log (t) - \beta _0 - {\varvec{x}}^{\mathrm {T}} \varvec{\beta }}{\sigma } \right\} \end{aligned}$$

for \(t > 0\). The most commonly seen model is the Arrhenius life model for industrial life testing (Nelson 1982; Härtler 1987), given by

$$\begin{aligned} Y = \log (T) = \beta _0 + \beta _1 x + \sigma W, \quad \mathrm {P}(W \le w) = \varPhi (w), \end{aligned}$$

where x is a designed level for a relevant stress factor depending on the context (such as temperature). An appropriate choice of \(\varPhi (\cdot )\) gives the Arrhenius-Weibull model and the Arrhenius-lognormal model (Nelson 1982).

3 Method of estimation under double-truncation

In this section, we extend the likelihood-based method of Efron and Petrosian (1999) to the location-scale and AFT models. We also develop a randomized Newton–Raphson (NR) algorithm for computing the MLE under both models. It is well-known that the (non-randomized) NR algorithm is sensitive in that some initial values, usually when chosen too distant from the MLE, may lead to divergence of the algorithm and thus not yield the MLE. Specifically, we add random noises to the initial values as previously proposed for the special exponential family models (Hu and Emura 2015). The purpose of randomization is to stabilize the algorithm, i.e. to address the risk of divergence owing to wrong initial values.

3.1 Likelihood functions

We introduce the likelihood functions for doubly truncated data proposed by Efron and Petrosian (1999). Recall that the data consist of triplets \((u_i, y_i, v_i)\), where \(u_i \le y_i \le v_i, i = 1, 2, \ldots , n\). Under the assumption that Y is independent of \((u_i, v_i)\) (Efron and Petrosian 1999), the truncated density of Y is

$$\begin{aligned} f_{\varvec{\theta }}(y | Y \in [u_i, v_i]) = {\left\{ \begin{array}{ll} f_{\varvec{\theta }}(y)/\{F_{\varvec{\theta }}(v_i) - F_{\varvec{\theta }}(u_i)\} &{} \text {if} \ y \in [u_i, v_i], \\ 0 &{} \text {if} \ y \notin [u_i, v_i], \end{array}\right. } \end{aligned}$$

where \(F_{\varvec{\theta }}(v_i) - F_{\varvec{\theta }}(u_i) = \int _{u_i}^{v_i} f_{\varvec{\theta }} (y) dy\). Hence, the log-likelihood function is

$$\begin{aligned} \ell (\varvec{\theta }) = \log \left\{ \prod _{i=1}^n \frac{f_{\varvec{\theta }}(y_i)}{F_{\varvec{\theta }}(v_i) - F_{\varvec{\theta }}(u_i)} \right\} = \sum _{i=1}^n \log f_{\varvec{\theta }}(y_i) - \sum _{i=1}^n \log \{ F_{\varvec{\theta }}(v_i) - F_{\varvec{\theta }}(u_i) \}. \end{aligned}$$

Under the log-location-scale model, the log-likelihood is therefore

$$\begin{aligned} \ell (\varvec{\theta })&= -n \log \sigma + \sum _{i=1}^n \log \left\{ \phi \left( \frac{y_i - \mu }{\sigma }\right) \right\} \\&- \sum _{i=1}^n \log \left\{ \varPhi \left( \frac{v_i - \mu }{\sigma }\right) - \varPhi \left( \frac{u_i - \mu }{\sigma }\right) \right\} , \end{aligned}$$

where \(\varvec{\theta }= (\mu , \sigma )\). The MLE is defined by \(\widehat{\varvec{\theta }}= (\widehat{\mu }, \widehat{\sigma })^{\mathrm T} = \mathrm {argmax}_{\varvec{\theta }} \ell (\varvec{\theta })\). The MLE for the cdf \(F_{\varvec{\theta }}(t) = \varPhi [\{ \log (t) - \mu \}/\sigma ]\) is \(F_{\widehat{\varvec{\theta }}}(t) = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\). Under the AFT model, the log-likelihood is

$$\begin{aligned} \ell (\varvec{\theta })&= -n \log \sigma + \sum _{i=1}^n \log \left\{ \phi \left( \frac{y_i - \beta _0 - {\varvec{x}}_i^{\mathrm T} \varvec{\beta }}{\sigma } \right) \right\} \\&\ \ \ - \sum _{i=1}^n \log \left\{ \varPhi \left( \frac{v_i - \beta _0 - {\varvec{x}}_i^{\mathrm T} \varvec{\beta }}{\sigma } \right) - \varPhi \left( \frac{u_i - \beta _0 - {\varvec{x}}_i^{\mathrm T} \varvec{\beta }}{\sigma } \right) \right\} , \end{aligned}$$

where \(\varvec{\theta }= (\beta _0, \varvec{\beta }^{\mathrm T}, \sigma )^{\mathrm T}\), \(\varvec{\beta }= (\beta _1, \beta _2, \ldots , \beta _k)^{\mathrm T}\) and \({\varvec{x}}_i = (x_{i,1}, x_{i,2}, \ldots , x_{i,k})^{\mathrm T}\). The MLE is defined by \(\widehat{\varvec{\theta }}= (\beta _0, \widehat{\varvec{\beta }}^{\mathrm T}, \widehat{\sigma })^{\mathrm T} = \mathrm {argmax}_{\varvec{\theta }} \ell (\varvec{\theta })\). The MLE for \(F_{\varvec{\theta }}(t | {\varvec{x}}) = \varPhi [\{\log (t) - \beta _0 - {\varvec{x}}^{\mathrm T} \varvec{\beta }\}/\sigma ]\) is \(F_{\widehat{\varvec{\theta }}}(t | {\varvec{x}}) = \varPhi [\{\log (t) - \widehat{\beta }_0 - {\varvec{x}}^{\mathrm T} \widehat{\varvec{\beta }}\}/ \widehat{\sigma }]\), where \({\varvec{x}}= (x_1, x_2, \ldots , x_k)^{\mathrm T}\) are given covariates.

Under both models, the NR algorithm is defined by the sequence

$$\begin{aligned} \varvec{\theta }^{(h+1)} = \varvec{\theta }^{(h)} - \begin{bmatrix} \frac{\partial ^2 \ell (\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^{\mathrm T}} \end{bmatrix}^{-1} \frac{\partial \ell (\varvec{\theta })}{\partial \varvec{\theta }} \Bigg |_{\varvec{\theta }= \varvec{\theta }^{(h)}}, \quad h = 0, 1, \ldots , \end{aligned}$$

where the derivatives of \(\ell (\varvec{\theta })\) are given in Appendix A. Due to the difficulty of choosing consistent initial values \(\varvec{\theta }^{(0)}\) as well as instability of convergence, we suggest the randomized NR algorithm, which will be defined in Sect. 3.2 (the log-location-scale model) and Sect. 3.3 (the AFT model).

Note that the likelihood functions are derived conditionally on \((u_i, v_i)\). Consequently, no distributional assumption on the truncation limits U and V is required. It is not even necessary for U and V to have a joint density. The full likelihood function would additionally require specifying the distribution of U and V, by which more parameters were to be considered. By using a conditional likelihood function, we can even consider non-identical distributions, namely \((u_i, v_i)\) and \((u_j, v_j)\) can have different distributions for \(i \not = j\).

In parametric settings, a key assumption is that the distributional form of \(F_{\varvec{\theta }}(y)\) is completely assumed for \(-\infty< y < \infty \). With this assumption, the parameters become identifiable through the conditional likelihood. This is different from the nonparametric setting, where the key assumptions are related to the support of U and V (Woodroofe 1985; Shen 2010; Moreira and de Uña-Álvarez 2010).

3.2 Randomized Newton–Raphson algorithm for the log-location-scale model

The proposed algorithm for the log-location-scale model is stated as follows:

figure a

Step 2 is performed until the algorithm converges. As initial values in Step 1, one may set \((\mu ^{(0)}, \sigma ^{(0)}) = \{\mathrm {Median}({\varvec{y}}), \mathrm {IQR}({\varvec{y}})/2\}\), where \(\mathrm {Median}({\varvec{y}})\) and \(\mathrm {IQR}({\varvec{y}})\) are the median and inter-quartile range of \({\varvec{y}}= (y_1, \ldots , y_n)\). An alternative are \(\mu ^{(0)} = \overline{{\varvec{y}}}\) and \(\sigma ^{(0)} = \{\sum _{i=1}^n (y_i - \overline{{\varvec{y}}})^2/(n-1)\}^{1/2}\). These initial values are consistent in the absence of double-truncation and under a symmetric error distribution.

Under the Weibull model, for which the error is asymmetric, we propose the following initial values. Let \(t_i, i=1 ,\ldots ,n\) be i.i.d. values such that \(\mathrm {P}(t_i > t) = \exp (-\lambda t^{\alpha })\). Menon (1963) showed that

$$\begin{aligned} \alpha ^{*}&= \frac{\pi }{\sqrt{6}} \left[ \frac{1}{n-1} \left\{ \sum _{i=1}^n (\log t_i)^2 - n^{-1} \left( \sum _{i=1}^n \log t_i \right) ^2 \right\} \right] ^{-1/2} \quad \text {and} \\ \lambda ^{*}&= \left\{ \frac{1}{n} \sum _{i=1}^n t_i^{\alpha ^{*}} \right\} ^{-1} \end{aligned}$$

are \(\sqrt{n}\)-consistent estimators for \((\alpha , \lambda )\) in the absence of truncation. By transformation, \( (\mu ^{(0)}, \sigma ^{(0)}) = \{(-\log \lambda ^{*}) / \alpha ^{*}, 1/\alpha ^{*}\}. \)

A very small value for \(\varPhi [\{v_i - \mu ^{(h)}\}/\sigma ^{(h)}] - \varPhi [\{u_i - \mu ^{(h)}\}/\sigma ^{(h)}]\) in Algorithm 1 may occur during the iteration. This causes \(H_{v_i}(\varvec{\theta }), H_{u_i}(\varvec{\theta }), K_{u_i}(\varvec{\theta })\) and \(K_{u_i}(\varvec{\theta })\) (see Appendix A for their definition) to be infinite, and hence produces computational problems for the Hessian matrix. This problem is handled by resetting the algorithm with new randomized initial values whenever \(\varPhi [\{v_i - \mu ^{(h)}\}/\sigma ^{(h)}] - \varPhi [\{u_i - \mu ^{(h)}\}/\sigma ^{(h)}]\) is small. The values \((d_1, d_2)\) are interpreted as the maximum noises to be added to the initial value \((\mu ^{(0)}, \sigma ^{(0)})\). In the analysis of real data, one may first try small values (e.g., \(D_1 = D_2 = d_1 = d_2 = 0.01\)). If Algorithm 1 does not converge, one may try larger values (\(D_1 = D_2 = d_1 = d_2 = 1\)).

3.3 Randomized Newton–Raphson algorithm for the AFT model

The proposed algorithm for the AFT model is stated as follows:

figure b

A single covariate (\(k=1\)) is the most relevant case in real applications of AFT models (Sect. 2). In this case, we suggest choosing the initial values of \(\varvec{\theta }^{(0)}\) using the least square estimator (LSE). Recall that the AFT model with \(k=1\) is

$$\begin{aligned} Y = \log (T) = \beta _0 + \beta _1 x + \sigma W, \quad \mathrm {P}(W \le w) = \varPhi (w). \end{aligned}$$

The LSE demands zero expectation and constant variance on the error term \(\sigma W\). In the Weibull AFT model, the expectation and variance of the error term are \(\mathrm {E}(\sigma W) = -0.5772 \sigma \) and \(\mathrm {Var}(\sigma W) = \sigma ^2 \pi ^2 / 6\), respectively. Following Cochran (1968), we add \(0.5772 \sigma \) to the error term,

$$\begin{aligned} Y + 0.5772 \sigma = \beta _0 + \beta _1 x + \epsilon ^{*}, \end{aligned}$$

where \(\epsilon ^{*} = \sigma W + 0.5772 \sigma \) has zero mean and \(\mathrm {Var}(\epsilon ^{*}) = \sigma ^2 \pi ^2 / 6\). The LSE minimizes

$$\begin{aligned} \sum _{i=1}^n (y_i - \beta _0 - \beta _1 x_i + 0.5772 \sigma )^2. \end{aligned}$$

Consequently, we obtain the initial values

$$\begin{aligned} \beta _0^{\mathrm {LSE}}&= \overline{{\varvec{y}}} - \beta _1^{\mathrm {LSE}} \overline{{\varvec{x}}} + 0.5772 \sigma ^{\mathrm {LSE}}, \quad \beta _1^{\mathrm {LSE}} = \frac{\sum _{i=1}^n x_i y_i - n \overline{{\varvec{x}}} \overline{{\varvec{y}}}}{\sum _{i=1}^n x_i^2 - n \overline{{\varvec{x}}}^2},\\ \sigma ^{\mathrm {LSE}}&= \frac{\sqrt{6}}{\pi } \left\{ \frac{1}{n-1} \sum _{i=1}^n (y_i^{*} - \overline{{\varvec{y}}}^{*})^2 \right\} ^{1/2}, \end{aligned}$$

where \(y_i^{*} = y_i - \beta _1^{\mathrm {LSE}} x_i\) and \(\overline{{\varvec{x}}} = \sum _{i=1}^n x_i / n\).

3.4 Asymptotic distribution

The asymptotic distribution of the MLE is the basis for computing the standard error, confidence interval, and confidence band. Conditionally on \((u_i, v_i)\) and under independence of \(y_i\) on \(u_i\) and \(v_i\), a single observation \(y_i\) follows a density \(f_{\varvec{\theta }}(y | Y \in [u_i, v_i])\). As pointed out by Emura et al. (2017), \(y_i, i=1, 2, \ldots , n\), are independent but not identically distributed (i.n.i.d.) since \([u_i, v_i] \not = [u_j, v_j]\) for \(i \not = j\). In the following, we therefore apply the asymptotic theory for i.n.i.d. variables following Emura et al. (2017) to show that the asymptotic variance matrix based on the conditional likelihood is valid for obtaining the standard errors.

Let \(\varvec{\theta }^{0}\) be the true parameter values. We define the information matrix \({\varvec{I}}_i(\varvec{\theta })\) of the i-th observation as

$$\begin{aligned} \int \frac{\partial }{\partial \varvec{\theta }} \log f_{\varvec{\theta }}(y | Y \in [u_i, v_i]) \cdot \frac{\partial }{\partial \varvec{\theta }^{\mathrm T}} \log f_{\varvec{\theta }}(y | Y \in [u_i, v_i]) \cdot f_{\varvec{\theta }}(y |Y \in [u_i, v_i]) \, dy. \end{aligned}$$

Under Assumptions (A)–(E) given in “Appendix B”, one can apply the Lindeberg-Feller central limit theorem and the strong law of large numbers for the i.n.i.d. random variables to prove that

$$\begin{aligned} -\frac{1}{n} \frac{\partial ^2}{\partial \varvec{\theta }\partial \varvec{\theta }^{\mathrm T}} \ell (\varvec{\theta }) \Bigg |_{\varvec{\theta }= \widehat{\varvec{\theta }}} = \frac{{\varvec{i}}(\widehat{\varvec{\theta }})}{n} \rightarrow ^p {\varvec{I}}(\varvec{\theta }^0) \quad \text {as} \quad n \rightarrow \infty , \\ \sqrt{n} (\widehat{\varvec{\theta }}- \varvec{\theta }^0) \rightarrow ^d \mathrm {N}({\varvec{0}}, {\varvec{I}}(\varvec{\theta }^0)^{-1}) \quad \text {as} \quad n \rightarrow \infty . \end{aligned}$$

Note that in the above expression, \(\mathrm N(\cdot )\) denotes the multivariate normal distribution, whose dimension depends on the number of components of \(\varvec{\theta }\). In the log-location-scale model, \(\varvec{\theta }\) contains two components, while in the AFT model, \(\varvec{\theta }\) consists of \(k+2\) components. The proofs follow from the arguments given by Emura et al. (2017) who applied a Taylor expansion to approximate \(\widehat{\varvec{\theta }}- \varvec{\theta }^0\) as the sum of i.n.i.d. random variables.

Thus, the standard errors (SEs) are

$$\begin{aligned} \mathrm {SE}(\widehat{\varvec{\theta }}_j) = \sqrt{[{\varvec{i}}^{-1}(\widehat{\varvec{\theta }})]_{j,j}} = \sqrt{\left[ \left\{ -\frac{\partial ^2}{\partial \varvec{\theta }\partial \varvec{\theta }^{\mathrm T}} \ell (\varvec{\theta }) \right\} ^{-1}\right] _{j,j} \Bigg |_{\varvec{\theta }= \widehat{\varvec{\theta }}}}, \end{aligned}$$

where \([\cdot ]_{j,j}\) represents the j-th diagonal element.

4 Interval estimation

In this section, several types of interval estimation are developed, including the confidence interval (CI), confidence region (CR), and confidence band (CB).

By the asymptotic normal approximation to \(\widehat{\varvec{\theta }}_j\) in Section 3.4, we construct the \((1-\alpha )100\%\) CI for \(\varvec{\theta }_j: [\widehat{\varvec{\theta }}_j - Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\varvec{\theta }}_j), \widehat{\varvec{\theta }}_j + Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\varvec{\theta }}_j)]\), where \(Z_p\) is the pth upper quantile for \(\mathrm N(0, 1)\). For \(\sigma > 0\), we use the log-transformed \((1-\alpha )100\%\) CI for \(\sigma \): \([\widehat{\sigma }\exp \{-Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\sigma })/\widehat{\sigma }\}, \widehat{\sigma }\exp \{Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\sigma })/\widehat{\sigma }\}]\).

4.1 Wald-type confidence interval for \(F_{\varvec{\theta }}(t)\)

By the delta method, the SE of \(F_{\widehat{\varvec{\theta }}}(t)\) is

$$\begin{aligned} \mathrm {SE}\{F_{\widehat{\varvec{\theta }}}(t)\} = \sqrt{\left\{ \frac{\partial }{\partial \varvec{\theta }} F_{\varvec{\theta }}(t) \right\} ^{\mathrm T} \cdot \left\{ - \frac{\partial ^2}{\partial \varvec{\theta }\partial \varvec{\theta }^{\mathrm T}} \ell (\varvec{\theta }) \right\} ^{-1} \cdot \left\{ \frac{\partial }{\partial \varvec{\theta }} F_{\varvec{\theta }}(t) \right\} \Bigg |_{\varvec{\theta }= \widehat{\varvec{\theta }}}}. \end{aligned}$$

Under the log-location-scale model,

$$\begin{aligned} \frac{\partial }{\partial \varvec{\theta }} F_{\varvec{\theta }}(t) = -\frac{1}{\sigma } \left[ \phi \left\{ \frac{\log (t) - \mu }{\sigma }\right\} , \left\{ \frac{\log (t) - \mu }{\sigma }\right\} \phi \left\{ \frac{\log (t) - \mu }{\sigma }\right\} \right] ^{\mathrm T} . \end{aligned}$$

Under the AFT model, the SE of \(F_{\widehat{\varvec{\theta }}}(t | {\varvec{x}})\) is computed from

$$\begin{aligned} \frac{\partial }{\partial \varvec{\theta }} F_{\varvec{\theta }}(t | {\varvec{x}}) = - \frac{1}{\sigma } \begin{bmatrix} 1 \\ {\varvec{x}}\\ \{\log (t) - \beta _0 - {\varvec{x}}^{\mathrm T} \varvec{\beta }\}/\sigma \end{bmatrix} \phi \left\{ \frac{\log (t) - \beta _0 - {\varvec{x}}^{\mathrm T} \varvec{\beta }}{\sigma } \right\} . \end{aligned}$$

The \((1-\alpha )100\%\) CI for \(F_{\varvec{\theta }}(t)\) is \([F_{\widehat{\varvec{\theta }}}(t) - Z_{\alpha /2} \cdot \mathrm {SE}\{F_{\widehat{\varvec{\theta }}}(t)\}, F_{\widehat{\varvec{\theta }}}(t) + Z_{\alpha /2} \cdot \mathrm {SE}\{F_{\widehat{\varvec{\theta }}}(t)\}]\). This is the Wald-type CI for \(F_{\varvec{\theta }}(t)\). However, if the SE is large, this CI might not be within the range (0, 1). Instead of the obvious corrective method of pruning the CI to the known range (0, 1), we consider an appropriate transformation method in the following, which was particularly developed for CIs for probabilities (see Hong et al. 2008).

4.2 Transformed confidence interval for \(F_{\varvec{\theta }}(t)\)

For the location-scale model, define a matrix \(\varLambda ^{*}\) by

$$\begin{aligned} {\varvec{i}}^{-1}(\widehat{\varvec{\theta }}) = \left( \begin{bmatrix} - \frac{\partial ^2}{\partial \mu ^2} \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \mu \partial \sigma } \ell (\varvec{\theta }) \\ - \frac{\partial ^2}{\partial \mu \partial \sigma } \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \sigma ^2} \ell (\varvec{\theta }) \end{bmatrix}^{-1} \right) \Bigg |_{\varvec{\theta }= \widehat{\varvec{\theta }}} = \begin{bmatrix} \lambda ^{*}_{11} &{} \lambda ^{*}_{12} \\ \lambda ^{*}_{12} &{} \lambda ^{*}_{22} \end{bmatrix} = \varLambda ^{*}. \end{aligned}$$

If \(1 - (Z_{\alpha /2} / \widehat{\sigma })^2 \lambda ^{*}_{22} > 0\), the transformed \(100(1-\alpha )\%\) CI for \(F_{\varvec{\theta }}(t)\) is

$$\begin{aligned}{}[\varPhi (a_{\min }), \varPhi (a_{\max })], \end{aligned}$$

where \(a_{\min } = \widehat{\xi }+ G_1(\varLambda ^{*}) - G_2(\varLambda ^{*}), \ a_{\max } = \widehat{\xi }+ G_1(\varLambda ^{*}) + G_2(\varLambda ^{*})\),

$$\begin{aligned} G_1(\varLambda ^{*})&= \frac{\varOmega ^2 (\lambda ^{*}_{12} + \widehat{\xi }\lambda ^{*}_{22})}{1 - \varOmega ^2 \lambda ^{*}_{22}},\\ G_2(\varLambda ^{*})&= \frac{\sqrt{\varOmega ^2 (\lambda ^{*}_{11} + 2 \widehat{\xi }\lambda ^{*}_{12} + \widehat{\xi }^2 \lambda ^{*}_{22}) - \varOmega ^4 (\lambda ^{*}_{11} \lambda ^{*}_{22} - \lambda ^{*2}_{12})}}{1 - \varOmega ^2 \lambda ^{*}_{22}}, \end{aligned}$$

\(\widehat{\xi }= (\log (t) - \widehat{\mu })/\widehat{\sigma }\) and \(\varOmega = Z_{\alpha /2} / \widehat{\sigma }\). The derivations of these formulas are given in “Appendix C”. If \(1 - (Z_{\alpha /2} / \widehat{\sigma })^2 \lambda ^{*}_{22} \le 0\), one cannot compute \(G_1(\varLambda ^{*})\). However, this unusual case only occurs for very small sample sizes.

4.3 Wald-type confidence region for \((\mu , \sigma )\)

For the log-location-scale model, the observed information can be written as

$$\begin{aligned} {\varvec{i}}(\widehat{\varvec{\theta }}) = \begin{bmatrix} - \frac{\partial ^2}{\partial \mu ^2} \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \mu \partial \sigma } \ell (\varvec{\theta }) \\ - \frac{\partial ^2}{\partial \mu \partial \sigma } \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \sigma ^2} \ell (\varvec{\theta }) \end{bmatrix} \Bigg |_{\varvec{\theta }= \widehat{\varvec{\theta }}} = \frac{n}{\widehat{\sigma }^2} \begin{bmatrix} i_{11} &{} i_{12} \\ i_{12} &{} i_{22} \end{bmatrix}. \end{aligned}$$

Define a matrix \(\varLambda \) by

$$\begin{aligned} \varLambda = \begin{bmatrix} i_{11} &{} i_{12} \\ i_{12} &{} i_{22} \end{bmatrix}^{-1} = \begin{bmatrix} \lambda _{11} &{} \lambda _{12} \\ \lambda _{12} &{} \lambda _{22} \end{bmatrix}. \end{aligned}$$

An approximate \(100(1-\alpha )\%\) confidence region (CR) for \(\varvec{\theta }\) is given by the ellipsoid

$$\begin{aligned} \{\varvec{\theta }: (\widehat{\varvec{\theta }}- \varvec{\theta })^{\mathrm T} {\varvec{i}}(\widehat{\varvec{\theta }}) (\widehat{\varvec{\theta }}- \varvec{\theta }) \le \chi ^2_{\mathrm {df} = 2}(\alpha ) \}, \end{aligned}$$

where \(\chi ^2_{\mathrm {df} = 2}(p)\) is the pth upper quantile for the chi-squared distribution.

4.4 Confidence band for \(F_{\varvec{\theta }}(t)\)

If the CR satisfies the restriction \(i_{11}(i_{22} - \gamma _0) - i^2_{22} > 0\), the CB for \(F_{\varvec{\theta }}(t)\) is given as follows.

Theorem 1

Under the regularity conditions given in “Appendix B”, an approximate \(100(1-\alpha )\%\) CB for \(F_{\mu , \sigma }(t)\) is

$$\begin{aligned}{}[\varPhi (q_{\min }), \varPhi (q_{\max })], \end{aligned}$$

where \(q_{\min } = w_{\widehat{p}} + h_2(\varLambda , \widehat{p}) - h_3(\varLambda , \widehat{p})\), \(q_{\max } = w_{\widehat{p}} + h_2(\varLambda , \widehat{p}) + h_3(\varLambda , \widehat{p})\),

$$\begin{aligned} h_2(\varLambda , \widehat{p})&= \frac{\gamma _0(\lambda _{12} + w_{\widehat{p}} \lambda _{22})}{1 - \gamma _0 \lambda _{22}},\\ h_3(\varLambda , \widehat{p})&= \frac{\sqrt{\gamma _0 (\lambda _{11} + 2 w_{\widehat{p}} \lambda _{12} + w_{\widehat{p}} \lambda _{22}^2) - \gamma _0^2(\lambda _{11} \lambda _{22} - \lambda _{12}^2)}}{1 - \gamma _0 \lambda _{22}}, \end{aligned}$$

\(\widehat{p} = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\), and \(w_{\widehat{p}} = \varPhi ^{-1}(\widehat{p}) = \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\).

Theorem 1 follows the results of Escobar et al. (2008) after some modifications. This method first derives the CB for the quantile function, and then transforms to \(F_{\mu , \sigma }(t)\). The detailed proofs are given in “Appendix D”. By construction, the CB is always wider than the transformed CI.

The CB can also be computed for the AFT model. Suppose we have the one-factor AFT model, \(y_i = \beta _0 + \beta _1 x_i + \sigma W_i, \ i=1, \ldots , n\). We originally have a 3-dimensional observed information matrix defined as

$$\begin{aligned} {\varvec{i}}(\widehat{\beta }_0, \widehat{\beta }_1, \widehat{\sigma }) = \begin{bmatrix} - \frac{\partial ^2}{\partial \beta _0^2} \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \beta _0 \partial \beta _1} \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \beta _0 \partial \sigma } \ell (\varvec{\theta }) \\ - \frac{\partial ^2}{\partial \beta _0 \partial \beta _1} \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \beta _1^2} \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \beta _1 \partial \sigma } \ell (\varvec{\theta }) \\ - \frac{\partial ^2}{\partial \beta _0 \partial \sigma } \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \beta _1 \partial \sigma } \ell (\varvec{\theta }) &{} - \frac{\partial ^2}{\partial \sigma ^2} \ell (\varvec{\theta }) \end{bmatrix} \Bigg |_{\varvec{\theta }= \widehat{\varvec{\theta }}}. \end{aligned}$$

Define a function \(g(r_1, r_2, r_3) = (r_1 + r_2 x, r_3)^{\mathrm T}\) and \(\mu = \beta _0 + \beta _1 x\). By the delta method, the observed information matrix for \((\mu , \sigma )\) is

$$\begin{aligned} {\varvec{i}}(\widehat{\mu }, \widehat{\sigma })&= \left( \begin{bmatrix} 1 &{} x &{} 0 \\ 0 &{} 0 &{} 1 \end{bmatrix} \times {\varvec{i}}^{-1}(\beta _0, \beta _1, \sigma ) \times \begin{bmatrix} 1 &{} x &{} 0 \\ 0 &{} 0 &{} 1 \end{bmatrix}^{\mathrm T} \right) ^{-1} \Bigg |_{\varvec{\theta }= \widehat{\varvec{\theta }}} . \end{aligned}$$

The CB for a given factor x can by obtained by replacing \({\varvec{i}}(\widehat{\varvec{\theta }})\) with \({\varvec{i}}(\widehat{\mu }, \widehat{\sigma })\) in Theorem 1.

5 Simulation

5.1 Simulation design

We randomly generate \((u_i, y_i, v_i)\), subject to \(u_i \le y_i \le v_i\). We generate the random vector (UYV) with the joint density \(f_U(u) f_Y(y) f_V(v)\) subject to the inclusion criterion \(U \le Y \le V\).

To generate data from the Weibull model, we let \(Y = \mu + \sigma W\) where \(\mathrm {P}(W > w) = \exp \{-\exp (w)\}\). We also generate data for U and V from Weibull models. The triplets (UYV) are independently generated by the inverse transformations

$$\begin{aligned} U&= \mu - \varDelta + \sigma \log \{-\log (1 - P_1)\}, \\ V&= \mu + \varDelta + \sigma \log \{-\log (1 - P_2)\}, \\ Y&= \mu + \sigma \log \{-\log (1 - P_3)\}, \end{aligned}$$

where \(P_1, P_2\) and \(P_3\) are i.i.d. U(0, 1) and \(\varDelta > 0\) is chosen to control the inclusion probability \(\mathrm {P}(U \le Y \le V)\). We set \(\mathrm {P}(U \le Y \le V) \approx 0.5\) for \((\mu =5, \sigma =2)\) and \(\mathrm {P}(U \le Y \le V) \approx 0.4\) for \((\mu = 5, \sigma =3)\) in the simulations.

For the one-factor Weibull AFT model \(Y_i = \log (T_i) = \beta _0 + \beta _1 x_i + \sigma W_i\), where \(\varPhi (w) = 1 - \exp \{-\exp (w)\}\), we consider three different levels \((x_i^{(1)} = 50, \ x_i^{(2)} = 100, \ x_i^{(3)} = 150)\). The Weibull AFT model can be written as \(Y_i^{(m)} = \beta _0 + \beta _1 x_i^{(m)} + \sigma W_i\), \(i=1,\ldots ,n\) and \(m=1, 2, 3\). The triplets \((U^{(m)}, Y^{(m)}, V^{(m)})\) are generated independently by using the inverse transformations

$$\begin{aligned} U_i^{(m)}&= \beta _0 + \beta _1 x_i^{(m)} - \varDelta + \sigma \log \{-\log (1-P_1)\}, \\ V_i^{(m)}&= \beta _0 + \beta _1 x_i^{(m)} + \varDelta + \sigma \log \{-\log (1-P_2)\}, \\ Y_i^{(m)}&= \beta _0 + \beta _1 x_i^{(m)} + \sigma \log \{-\log (1-P_3)\}, \end{aligned}$$

where \(m=1, 2, 3\) and \(P_1, P_2\) and \(P_3\) are i.i.d. U(0, 1) and \(\varDelta > 0\) is chosen to control the probability \(\mathrm {P}(U^{(m)} \le Y^{(m)} \le V^{(m)})\). The choice of \(\varDelta \) is independent of the level m since

$$\begin{aligned} \mathrm {P}(U^{(1)} \le Y^{(1)} \le V^{(1)}) = \mathrm {P}(U^{(2)} \le Y^{(2)} \le V^{(2)}) = \mathrm {P}(U^{(3)} \le Y^{(3)} \le V^{(3)}). \end{aligned}$$

We set equally sized samples for the three levels. For instance, for the sample size \(n = 75\), we allocate \(n/3 = 25\) observations for each level. We choose \(\varDelta \) such that \(\mathrm {P}(U \le Y \le V) = 0.5, 0.55\) and 0.6.

Once the data are generated, we compute parameter estimates and evaluate their accuracy in terms of the mean squared error (MSE). For instance, the MSE for estimating \(\sigma \) is defined as \(\mathrm {MSE}(\widehat{\sigma }) = \mathrm {E}(\widehat{\sigma }- \sigma )^2\). We also compute the average number of the NR iterations and the average number of randomizations to evaluate the convergence speed of Algorithms 1 and 2. We adopt similar designs for the lognormal models.

5.2 Simulation results

Table 1 shows the results for estimating \(\mu \) and \(\sigma \) with different initial values under the Weibull model. It reveals that the estimators are almost unbiased and the MSE decreases when increasing the sample size. For instance, under \(n=50\) and \((\mu = 5, \sigma = 2)\), we observe \(\mathrm {E}(\widehat{\mu }) = 4.996\) and \(\mathrm {E}(\widehat{\sigma }) = 2.010\). The average number of Newton–Raphson iterations is 5 to 8. Moreover, the initial value \(\{(-\log \lambda ^*)/\alpha ^*, 1/\alpha ^*\}\) performs better than \(\{\mathrm {Median}({\varvec{y}}), \mathrm {IQR}({\varvec{y}})/2\}\) in terms of the number of iterations and randomizations. For the latter initial values, about 1 to 3 randomizations are necessary to make the NR algorithm converge. Hereafter, we use the initial value \(\{(-\log \lambda ^*)/\alpha ^*, 1/\alpha ^*\}\).

Table 1 Simulation results on the MLE of \((\mu , \sigma )\) based on 1000 repetitions under the Weibull model
Table 2 Simulated coverage probabilities of the 95% CI for \(\mu , \sigma \) and \(F_{\varvec{\theta }}(t)\) without transformation based on 1000 repetitions under the Weibull model

Table 2 shows the results on interval estimation for \(\mu , \sigma \) and \(F_{\varvec{\theta }}(t)\), where t is chosen to be the \(25\%\) or \(75\%\) quantile. The sample standard deviations are close to the average SEs in all cases. The coverage probabilities for all true parameters are close to the nominal 95%, except for some under-coverage for the 25% point for \(F_{\varvec{\theta }}(t)\).

Table 3 reports the coverage probabilities of the transformed CI for \(F_{\varvec{\theta }}(t)\), where t is chosen to be the 25%, 50% or 75% quantile. At the nominal levels \(1-\alpha = 0.80, 0.85, 0.90, 0.95\), and 0.99, the coverage probabilities for \(F_{\varvec{\theta }}(t)\) are close to the nominal ones. Compared to the Wald-type CI, the transformed CI for \(F_{\varvec{\theta }}(t)\) gives better coverage probabilities.

Table 3 Simulated coverage probabilities of the transformed CI based on 1000 repetitions under the Weibull model

Table 4 shows the coverage probabilities for the CR for \((\mu , \sigma )\) and the CB for \(F_{\varvec{\theta }}(t), t \ge 0\). The coverage probabilities for the CR and CB are closer to the nominal levels when the sample size increases. Since the CB is derived from the CR, their coverage probabilities are nearly equivalent. There are a few cases (in 1000 repetitions) that do not satisfy the condition \((1-z_{\alpha /2}/\widehat{\sigma }) \lambda _{22}^* > 0\). In these cases, the CB cannot be computed and is defined as (0, 1).

Table 4 Simulated coverage probabilities of the CR and CB based on 1000 repetitions under the Weibull model

Table 5 shows the results for point estimation of \(\beta _0, \beta _1\) and \(\sigma \) with different initial values under the Weibull AFT model. The estimated parameters are almost unbiased for the true parameters, regardless of the choice of the initial values. The convergence for the initial values by the LSE is always faster than for the initial values by the true parameter values (see AI and AR in Table 5). For the latter initial values, a fairly large number of randomizations is necessary for small samples (\(n = 75\) and \(n = 150\)). This indicates that the LSE method is a reliable method to find the initial values. Hereafter, we only use the initial values by the LSE.

Table 5 Simulation results on the MLE of \((\beta _0, \beta _1, \sigma )\) based on 1000 repetitions under the Weibull AFT model

Table 6 summarizes the results on interval estimation for \(\beta _0, \beta _1\) and \(\sigma \). The sample standard deviations are close to the average SEs in all cases. The coverage probabilities for the true parameters are close to the nominal 95%.

Table 6 Simulation results on the MLE of \((\beta _0, \beta _1, \sigma )\) based on 1000 repetitions under the Weibull AFT model

Table 7 states the coverage probability for the CB for \(F_{\varvec{\theta }}(t | x=50)\). Overall, the coverage probabilities for the CB are close to the nominal levels. A systematic under-coverage is found for the small samples \(n = 75\) but the coverage performance is improved by increasing the sample size up to \(n = 450\).

Table 7 Simulated coverage probabilities of the CB for \(F_{\varvec{\theta }}(t | x=50)\) based on 1000 repetitions under the Weibull AFT model

Supplementary Material shows the simulation results under the lognormal distribution for T. The results are similar to these of Tables 1, 2, 3, 4, 5, 6, 7.

6 Data analysis

We fit the proposed models to the Equipment-S data (Ye and Tang 2016).

6.1 The Equipment-S data

The installation of a unit, called Equipment-S, began in 1977. If a unit fails, it is repaired by the maintenance department. However, the follow-up started at \(\tau _s = 1996\) after the maintenance department realized the importance of collecting the data. The data were collected until the study-end of \(\tau _e = 2012\). Hence a unit is available if its failure date F satisfies \(\tau _s \le F \le \tau _e\). The sample inclusion criterion is rewritten as \(\tau _s - B \le T \le \tau _e - B\), where B is the installation year (birth year) and \(T = F - B\) is the lifetime of the unit. Note that \(\tau _s - B \le T\) holds for a unit installed after the follow-up start \((\tau _s - B < 0\)), and hence, such units are not subject to left-truncation.

Figure 2 shows the data collection scheme. For instance, the 105-th unit has the installation year \(B_{105} = 1989.333\) and the failure year \(F_{105} = 2003.6\). Thus, the observation consists of the lifetime \(t_{105} = F_{105} - B_{105} = 14.267\), left-truncation limit \(\max (\tau _s - B_{105}, 0) = 6.7\), and right-truncation limit \(\tau _e - B_{105} = 22.7\). The 30-th unit has the installation year \(B_{30} = 2001.333\) and the failure year \(F_{30} = 2009.4\). Thus, the observation consists of the lifetime \(t_{30} = F_{30} - B_{30} = 8.067\), and right-truncation limit \(\tau _e - B_{30} = 10.667\) without left-truncation.

Fig. 2
figure 2

The Equipment-S data scanned from Ye and Tang (2016). The circles stand for installation dates and crosses denote the failure date

After log-transformations, the sample inclusion criterion is re-written as \(u_i \le y_i \le v_i\), where \(u_i \equiv \log \{\tau _s - B_i\}\) is the left-truncation limit (or \(u_i = -\infty \) in the absence of left-truncation), and \(v_i \equiv \log (\tau _e - B_i)\) is the right-truncation limit. Thus, the data consist of \((u_i, y_i, v_i)\) subject to \(u_i \le y_i \le v_i\) for \(i = 1, 2, \ldots , n\), where \(n=174\).

6.2 Numerical results

We fitted the Weibull, lognormal and Log-logistic models to the data \(\{(u_i, y_i, v_i); i=1, 2, \ldots , n\}\). For each model, we applied the NR algorithm (Algorithm 1) to obtain the MLE with the convergence criterion \(\epsilon = 10^{-5}\). The NR algorithm converged without randomization.

Table 8 shows the fitted results of the Weibull, the lognormal, and the Log-logistic models. All these MLEs indeed attained the global maximum of their log-likelihood functions (see the graphical displays given in Supplementary Material). The log-likelihood values indicate that the lognormal model provides the best fit. In every model, the cdf \(F(t) = \mathrm {P}(T \le t)\) was estimated by \(F_{\widehat{\varvec{\theta }}}(t) = \varPhi \{(\log (t) - \widehat{\mu })/\widehat{\sigma }\}\).

Table 8 Fitted values under the three models with the failure data of Equipment-S

In addition to the parametric analyses, we conducted a nonparametric analysis. Figure 3 provides the NPMLE \(\widehat{F}^{\mathrm {NPMLE}}(t) = \sum _{t_i \le t} \widehat{f}_i\), where \(\widehat{f}_i, i=1, \ldots , n\) were computed by a self-consistency algorithm (Efron and Petrosian 1999). Figure 3 also displays the empirical cdf, \(\widehat{F}(t) = \sum _{i=1}^{174} \mathrm {I}(t_i \le t)/174\), where \(\mathrm {I}(\cdot )\) is the indicator function. The empirical cdf gives substantial upward bias relative to the NPMLE. This bias is due to the ignorance of double-truncation, in particular right-truncation.

Fig. 3
figure 3

The estimates for the cdf F(t) for lifetime of Equipment-S

The NPMLE \(\widehat{F}^{\mathrm {NPMLE}}(t)\) is regarded as an estimator of the truncated cdf \(F(t)/(F(\tau _2) - F(\tau _1))\), where \(\tau _1\) is the lower support for the left-truncation limit and \(\tau _2\) is the upper support for the right-truncation limit (Woodroofe 1985). We set \(\tau _1 = 0\) and \(\tau _2 = \exp (35)\), where the 35 (years) is chosen according to \(\max _i(\exp (v_i)) = 34.8\), and regard \(\widehat{F}^{\mathrm {NPMLE}}(t)\) as an estimator for \(F(t)/(F(\tau _2) - F(\tau _1))\). In general, \(\widehat{F}^{\mathrm {NPMLE}}(t)\) does not consistently estimate F(t) unless the identifiability assumptions are met (Woodroofe 1985).

Figure 3 compares the parametric and nonparametric approaches for estimating the truncated cdf \(F(t)/(F(\tau _2) - F(\tau _1))\). The figure shows that all three parametric approaches provide similar results. To select the best model among the three parametric models, we computed the Kolmogorov–Smirnov (K–S) statistic

$$\begin{aligned} K = \sup _{\tau _1 \le t \le \tau _2} | \widehat{F}^{\mathrm {NPMLE}}(t) - F_{\widehat{\varvec{\theta }}}(t)/(F_{\widehat{\varvec{\theta }}}(\tau _2) - F_{\widehat{\varvec{\theta }}}(\tau _1))|, \end{aligned}$$

and the Cramér–von Mises statistic

$$\begin{aligned} C = \sum _{i=1}^n \{\widehat{F}^{\mathrm {NPMLE}}(t_i) - F_{\widehat{\varvec{\theta }}}(t_i)/(F_{\widehat{\varvec{\theta }}}(\tau _2) - F_{\widehat{\varvec{\theta }}}(\tau _1))\}^2. \end{aligned}$$

The lognormal model achieved the smallest C and K values (Table 8). Thus, we chose the lognormal model that also provided the largest likelihood value. The subsequent analyses are solely based on the lognormal model.

Fig. 4
figure 4

The estimate of the cdf along with the 95% CB and the 95% CI under the lognormal model based on the Equipment-S data

Figure 4 depicts the estimated cdf \(F_{\widehat{\varvec{\theta }}}(t) = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\) under the lognormal model. Notice that the cdf can be computed for values t greater than the largest observed lifetime \(\max (t_i) = 34.5\). For instance, we have \(F_{\widehat{\varvec{\theta }}}(35) = 0.45\). Hence, one can infer that more than half of the installed units can survive beyond 35 years (but those in the sample all failed within 35 years). This type of information can not be provided by the NPMLE that gives \(\widehat{F}^{\mathrm {NPMLE}}(34.5) = 1\).

Figure 4 depicts the estimated cdf along with the 95% CI and 95% CB. We observe that the width of the CB is larger than the width of the CI. The large widths of the CB and CI imply that the estimate \(F_{\widehat{\varvec{\theta }}}(t)\) has a very large variation; the phenomenon will be further discussed below.

6.3 Effect of double-truncation

We now examine how double-truncation influenced the estimates of lifetime distribution in the present data analysis. To this end, it is central to examine the sample inclusion probability

$$\begin{aligned} c&= \mathrm {P}\{\max (\tau _s - B,0) \le T \le \tau _e - B\}\\&= \int _{\tau _0}^{\tau _e} \mathrm {P}\{\max (\tau _s - b, 0) \le T \le \tau _e - b\} dF_B(b) \\&= \int _{\tau _0}^{\tau _s} \left\{ \varPhi \left( \frac{\log (\tau _e - b) - \mu }{\sigma } \right) - \varPhi \left( \frac{\log (\tau _s - b) - \mu }{\sigma } \right) \right\} dF_B(b) \\&\ \ \ + \int _{\tau _s}^{\tau _e} \varPhi \left( \frac{\log (\tau _e - b) - \mu }{\sigma } \right) d F_B(b), \end{aligned}$$

where \(\tau _0 = 1977\) is the start of installing the units, \(\tau _s = 1996\) is the start of collecting the data, and \(\tau _e = 2012\) is the end of collecting the data, \(F_B(b) = \mathrm {P}(B \le b)\) is the ’birth’ distribution of the units defined on \(b \in [\tau _0, \tau _e]\). The value c is estimable if \(F_B(b)\) is specified.

To estimate the birth distribution \(F_B(b)\), we treat B as the lifetime of interest subject to the double truncation criterion \(\tau _s - T \le B \le \tau _e - T\). Figure 5 depicts the NPMLE \(\widehat{F}_B^{\mathrm {NPMLE}}(b) = \sum _{B_i \le b} \widehat{f}_{B_i}\).

Fig. 5
figure 5

The estimates for the installation years for Equipment-S

Under the assumption that the NPMLE can identify the birth distribution,

$$\begin{aligned} \widehat{c}&= \sum _{\tau _0 \le B_i \le \tau _s} \left\{ \varPhi \left( \frac{\tau _e - B_i - \widehat{\mu }}{\widehat{\sigma }} \right) - \varPhi \left( \frac{\tau _s - B_i - \widehat{\mu }}{\widehat{\sigma }} \right) \right\} \widehat{f}_{B_i}\\&\ \ \ + \sum _{\tau _s < B_i} \varPhi \left( \frac{\tau _e - B_i - \widehat{\mu }}{\widehat{\sigma }} \right) \widehat{f}_{B_i} = 0.20. \end{aligned}$$

Thus, among the units installed between \(\tau _0 = 1977\) and \(\tau _e = 2012\), only 20% were included in the sample. Note that the last observed installation occurred at 2005.75 and therefore the NPMLE yields \(\widehat{F}_B(2005.75) = 1\), while the sampling continued until \(\tau _e = 2012\). Note also that the birth distribution is reasonably identified since the observed values of \(B_i\) are scattered over the support of \(F_B\) (see Fig. 5). The large widths of the CB and CI are possibly due to the small inclusion probability; see the information loss formula due to truncation (Emura and Konno 2012).

To confirm our conjecture, we analyzed the data by ignoring double-truncation; replacing \(u_i\) by a large negative value and replacing \(v_i\) by a large positive value. The estimated cdf shows that the CB and CI become substantially shorter (shown in Supplementary Material). Such results produce not only biased results but also highly optimistic account for statistical variation.

Another way for reducing the widths of the CB and CI is to increase the sample size. To see the required sample size, we conduct a simulation study. We assume that a pair (BT) follows the estimated distribution \(\widehat{F}_B^{\mathrm {NPMLE}}\) and a lognormal distribution \(\mathrm {P}(T\le t) = F_{\widehat{\varvec{\theta }}}(t) = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\) that is the estimated distribution by the Equipment-S data. Following the same truncation scheme, we obtain simulated data \((u_i, y_i, v_i)\) subject to \(u_i \le y_i \le v_i\) for \(i=1, 2, \ldots , n\), where we set \(n=200, 1000\), and 5000.

Table 9 shows the simulation results based on a single dataset set of \(n=200, 1000\), or 5000. The estimates are reasonably close to the true values, and their SEs decrease as the sample sizes increase. Due to the small sample inclusion rates, the SEs are somewhat large for \(n=200\), confirming that the size \(n=174\) is not enough to obtain precise estimates for parameters.

Table 9 Simulation results under the setting of the Equipment-S data

Figure 6 depicts the 95% CI and 95% CB for \(n=200\) and 5000. As we expect, the widths of the CI (or CB) are very large for \(n=200\). The widths become reasonably small for \(n=5000\).

Fig. 6
figure 6

The estimate of \(F_{\widehat{\varvec{\theta }}}(t)\), the 95% CB, and the 95% CI under the lognormal model based on the simulated data with \(n=200\) (left) and \(n=5000\) (right)

7 Concluding remarks

This article proposes the log-location-scale model and AFT model for analyzing doubly-truncated data. We developed a Newton–Raphson algorithm to obtain the MLE by a randomization scheme for ascertaining the initial values. We also considered interval estimation schemes, including the CI, CR, and CB. We have implemented some of the proposed methods in the R package ’double.truncation’ (Emura et al. 2018). For instance, one can use the PMLE.lognormal function to compute the MLE under the lognormal model.

The log-location-scale model includes commonly used models in reliability analysis, such as the Weibull, lognormal, and Log-logistic models. When the three models are fitted to the Equipment-S data, the lognormal model is chosen as the best model according to the likelihood as well as other distance measures. However, the performance of the model selection method remains to be studied. It is also relevant to develop a parametric bootstrap method to compute the p-value of goodness-of-fit tests.

Note that the technique of obtaining closed-form expressions for the CB for the cdf was developed under complete data (Cheng and Iles 1983) and censored data (Jeng and Meeker 2001). The CB derived in this article is regarded as an extension of these previous works to doubly-truncated data. If the NPMLE is used for estimating the cdf, the Kolmogorov–Smirnov or the Cramér-von-Mises type CB can be derived (Emura et al. 2015). However, these nonparametric methods are not suitable under parametric models.

The simulations show that the proposed initial values based on the moment-type methods and LSE-type method lead to quick and reliable convergence to the MLE, even though they are not consistent. On the other hand, the simulations also show that some initial values tend to diverge for small sample sizes, and hence, the randomization algorithm to search initial values plays a crucial role to guarantee the eventual convergence. An alternative is to apply the moment-based method of Frank and Dörre (2017) and Chapter 5 of Dörre and Emura (2019) in an attempt to find consistent initial estimates. However, the method involves nonparametric estimation of relevant quantities, which is not always consistent unless some support conditions are satisfied.

An important finding in our real data analysis is the large width of the CI and CB. This issue arises because doubly-truncated data lose a substantial amount of information by their sampling design, even though the sample size \(n=174\) appears to be large. If double-truncation is simply ignored, the width of the CI and CB becomes much smaller. However, such a naïve analysis is biased and improperly accounts for the sampling variation.

It would be of great interest to consider a Bayesian approach to analyze doubly-truncated data as in Dörre (2020). Their model imposes parametric assumptions on the birth process of units as well as parametric assumptions on lifetime. In addition, their MCMC type algorithm may stabilize the convergence by avoiding the maximization of the complex likelihood. These stronger parametric assumptions are one possible way to narrow down the large width of the CI and CB that were observed in our real data analysis. However, we expect that the expression of the CB would be fairly difficult to derive analytically. Hence, the proposed CB, which can be calculated analytically, would remain useful. Throughout the article, we have assumed the independence between truncation limits and the lifetime of interest. One possible topic for future work is to develop statistical methods by relaxing the independence assumption using copulas. Emura and Pan (2020) developed likelihood-based methods for dependent left-truncation under a bivariate copula model. To extend this approach to accommodate double-truncation, one may consider tri-variate copula models.