Abstract
Doubly-truncated data arise in many fields, including economics, engineering, medicine, and astronomy. This article develops likelihood-based inference methods for lifetime distributions under the log-location-scale model and the accelerated failure time model based on doubly-truncated data. These parametric models are practically useful, but the methodologies to fit these models to doubly-truncated data are missing. We develop algorithms for obtaining the maximum likelihood estimator under both models, and propose several types of interval estimation methods. Furthermore, we show that the confidence band for the cumulative distribution function has closed-form expressions. We conduct simulations to examine the accuracy of the proposed methods. We illustrate our proposed methods by real data from a field reliability study, called the Equipment-S data.
Similar content being viewed by others
1 Introduction
Doubly-truncated data arise in many fields, including economics, engineering, medicine, and astronomy. When studying the lifetime distribution of a technological unit, it may be too time-consuming or operationally not feasible to obtain simple failure data. In this case, it is more convenient to collect field-failure data (Lawless 2003; Ye and Tang 2016), where any unit is retrospectively collected if it fails within a pre-specified study period. This sampling mechanism yields two types of missing units: left-truncated units that fail before study initiation and right-truncated units that fail after the study end. This sampling design is common in astronomy (Efron and Petrosian 1999), economics (Dörre 2020; Frank and Dörre 2017), medicine (Scheike and Keiding 2006; Moreira and de Uña-Álvarez 2010), and other fields. Analyses of doubly-truncated data require appropriate techniques to account for the sampling design.
Figure 1 illustrates the sampling mechanism. Let B be the birth date of a unit and T be its lifetime. Hence, \(F=B+T\) is the date of failure. Suppose that data collection begins at the date \(\tau _s\) and ends at the date \(\tau _e\). We only collect units whose failures occur in the interval \([\tau _s, \tau _e]\). Therefore, the sampling criterion is \(\tau _s - B \le T \le \tau _e - B\). In Fig. 1, two units are collected and three units are not observed out of a hypothetical population of five units. This yields a biased sample, where the units with too short or too long lifetime are less likely to be sampled. Data collected in this manner are called doubly-truncated, because the lifetime of interest T is observed if it falls between the left-truncation limit \(\tau _s - B\) and right-truncation limit \(\tau _e - B\).
In the framework of random double-truncation, the sampling criterion is \(U \le Y \le V\), where U, Y, and V are random variables. The example of Fig. 1 gives \(U = \log (\tau _s - B)\), \(Y = \log (T)\), and \(V = \log (\tau _e - B)\), where the logarithms avoid the constraint \(T \ge 0\). We call U the left-truncation limit, and V the right-truncation limit. We observe the triplet (U, Y, V) if \(U \le Y \le V\). Hence, doubly-truncated data consist of \((u_i, y_i, v_i)\), where \(u_i \le y_i \le v_i\) for \(i = 1, 2, \ldots , n\). In the analysis of doubly-truncated data, the goal is to estimate the distributional parameters of \(T = \exp (Y)\) such as the cumulative distribution function (cdf) \(\mathrm {P}(T \le t)\). In this article, we assume independence between truncation limits and the lifetime of interest, i.e. between (U, V) and Y. This is a usual assumption in the literature for treating doubly-truncated data. However, this assumption can be violated in some contexts (see Shen 2011 for a corresponding test of independence).
Nonparametric methods based on doubly-truncated data are well-established. The nonparametric maximum likelihood estimator (NPMLE) for the cdf was proposed by Efron and Petrosian (1999). Shen (2010) and Moreira and de Uña-Álvarez (2010) investigated the asymptotic properties of the NPMLE. Emura et al. (2015) developed an alternative asymptotic theory, and developed the confidence band for the cdf. A kernel estimation for the density function was considered; see Moreira and Van Keilegom (2013) and references therein. A semi-parametric approach under a parametric truncation distribution was also studied; see Moreira, de Uña-Álvarez and Van Keilegom (2014) and references therein. However, parametric methods for doubly-truncated data are relatively underdeveloped. Efron and Petrosian (1999), Hu and Emura (2015), and Emura et al. (2017) considered maximum likelihood estimation (MLE) under the assumption that \(Y = \log (T)\) follows the special exponential family. Dörre (2020) considered a Bayesian parametric estimation under a general exponential family. These exponential families contain the normal, skew-normal, exponential, and gamma distributions, but still do not contain many useful distributions such as the Weibull and log-logistic distributions. In addition, none of the previous works discuss the technique of constructing the confidence band, which is technically challenging in parametric models.
In this article, we consider the log-location-scale model for lifetime distributions, which contains the Weibull, lognormal, and log-logistic distributions as special cases. We also consider accelerated failure time (AFT) models that are the regression extension of the log-location-scale model. These parametric models are clearly useful, but the methodologies to fit these models under double-truncation are missing. We develop algorithms for fitting these models, and propose several types of interval estimation schemes. We develop the Wald-type confidence interval and transformed confidence interval for the cdf. We also derive a closed-form expression for the confidence band for the cdf, which extends the results of Cheng and Iles (1983) and Jeng and Meeker (2001) to doubly-truncated data.
This article is organized as follows. Section 2 introduces the model. In Sects. 3, 4, estimation procedures are developed. Sections 5, 6 provide numerical studies. Section 7 concludes the article.
2 The location-scale and AFT models
Let T be a non-negative random variable, and \(Y = \log (T)\) be its log-transform. Then, the location-scale model for Y is defined as the probability density function (pdf) \(f_{\varvec{\theta }}(y) = \phi \{(y-\mu )/\sigma \}/\sigma \) and cdf \(F_{\varvec{\theta }}(y) = \varPhi \{(y-\mu )/\sigma \}\), where \(\varvec{\theta }= (\mu , \sigma )^{\mathrm {T}}\) are the unknown location and scale parameters, \(\varPhi (z)\) is the standardized cdf, and \(\phi (z) = d \varPhi (z)/dz\) is its pdf.
The location-scale model for Y gives the log-location-scale model for T with the cdf and pdf
where \( -\infty< \mu < \infty \) and \(\sigma > 0\). The log-location-scale model for T induces:
-
Weibull distribution: \(\varPhi (w) = 1 - \exp \{-\exp (w)\}\),
-
Lognormal distribution: \(\varPhi (w) = \int _{-\infty }^w \frac{1}{\sqrt{2 \pi }} \exp \left\{ - 0.5 x^2 \right\} dx\),
-
Log-logistic distribution: \(\varPhi (w) = 1/\{1 + \exp (-w)\}\),
where \(-\infty<w<\infty \). By the transformations \(\mu = (-\log \lambda )/\alpha \) and \(\sigma = 1/\alpha \), the Weibull distribution yields the scale parameter \(\lambda > 0\) and the shape parameter \(\alpha > 0\). The cdf and pdf are
The Weibull distribution is denoted as \(T \sim \mathrm {Weibull}(\lambda , \alpha )\).
The accelerated failure time (AFT) model extends the location-scale model by including covariates that accelerate or decelerate the life course. The AFT model can be expressed as
where \(\beta _0\) is an intercept and \(\varvec{\beta }= (\beta _1, \beta _2, \ldots , \beta _k)^{\mathrm {T}}\) are coefficients for designed covariates \({\varvec{x}}= (x_1, x_2, \ldots , x_k)^{\mathrm {T}}\). The error term W is distributed as \(\mathrm {P}(W \le w) = \varPhi (w)\), as in the location-scale model, and assumed to be independent of the covariates. The cdf and pdf of T given \({\varvec{x}}\) are written as
for \(t > 0\). The most commonly seen model is the Arrhenius life model for industrial life testing (Nelson 1982; Härtler 1987), given by
where x is a designed level for a relevant stress factor depending on the context (such as temperature). An appropriate choice of \(\varPhi (\cdot )\) gives the Arrhenius-Weibull model and the Arrhenius-lognormal model (Nelson 1982).
3 Method of estimation under double-truncation
In this section, we extend the likelihood-based method of Efron and Petrosian (1999) to the location-scale and AFT models. We also develop a randomized Newton–Raphson (NR) algorithm for computing the MLE under both models. It is well-known that the (non-randomized) NR algorithm is sensitive in that some initial values, usually when chosen too distant from the MLE, may lead to divergence of the algorithm and thus not yield the MLE. Specifically, we add random noises to the initial values as previously proposed for the special exponential family models (Hu and Emura 2015). The purpose of randomization is to stabilize the algorithm, i.e. to address the risk of divergence owing to wrong initial values.
3.1 Likelihood functions
We introduce the likelihood functions for doubly truncated data proposed by Efron and Petrosian (1999). Recall that the data consist of triplets \((u_i, y_i, v_i)\), where \(u_i \le y_i \le v_i, i = 1, 2, \ldots , n\). Under the assumption that Y is independent of \((u_i, v_i)\) (Efron and Petrosian 1999), the truncated density of Y is
where \(F_{\varvec{\theta }}(v_i) - F_{\varvec{\theta }}(u_i) = \int _{u_i}^{v_i} f_{\varvec{\theta }} (y) dy\). Hence, the log-likelihood function is
Under the log-location-scale model, the log-likelihood is therefore
where \(\varvec{\theta }= (\mu , \sigma )\). The MLE is defined by \(\widehat{\varvec{\theta }}= (\widehat{\mu }, \widehat{\sigma })^{\mathrm T} = \mathrm {argmax}_{\varvec{\theta }} \ell (\varvec{\theta })\). The MLE for the cdf \(F_{\varvec{\theta }}(t) = \varPhi [\{ \log (t) - \mu \}/\sigma ]\) is \(F_{\widehat{\varvec{\theta }}}(t) = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\). Under the AFT model, the log-likelihood is
where \(\varvec{\theta }= (\beta _0, \varvec{\beta }^{\mathrm T}, \sigma )^{\mathrm T}\), \(\varvec{\beta }= (\beta _1, \beta _2, \ldots , \beta _k)^{\mathrm T}\) and \({\varvec{x}}_i = (x_{i,1}, x_{i,2}, \ldots , x_{i,k})^{\mathrm T}\). The MLE is defined by \(\widehat{\varvec{\theta }}= (\beta _0, \widehat{\varvec{\beta }}^{\mathrm T}, \widehat{\sigma })^{\mathrm T} = \mathrm {argmax}_{\varvec{\theta }} \ell (\varvec{\theta })\). The MLE for \(F_{\varvec{\theta }}(t | {\varvec{x}}) = \varPhi [\{\log (t) - \beta _0 - {\varvec{x}}^{\mathrm T} \varvec{\beta }\}/\sigma ]\) is \(F_{\widehat{\varvec{\theta }}}(t | {\varvec{x}}) = \varPhi [\{\log (t) - \widehat{\beta }_0 - {\varvec{x}}^{\mathrm T} \widehat{\varvec{\beta }}\}/ \widehat{\sigma }]\), where \({\varvec{x}}= (x_1, x_2, \ldots , x_k)^{\mathrm T}\) are given covariates.
Under both models, the NR algorithm is defined by the sequence
where the derivatives of \(\ell (\varvec{\theta })\) are given in Appendix A. Due to the difficulty of choosing consistent initial values \(\varvec{\theta }^{(0)}\) as well as instability of convergence, we suggest the randomized NR algorithm, which will be defined in Sect. 3.2 (the log-location-scale model) and Sect. 3.3 (the AFT model).
Note that the likelihood functions are derived conditionally on \((u_i, v_i)\). Consequently, no distributional assumption on the truncation limits U and V is required. It is not even necessary for U and V to have a joint density. The full likelihood function would additionally require specifying the distribution of U and V, by which more parameters were to be considered. By using a conditional likelihood function, we can even consider non-identical distributions, namely \((u_i, v_i)\) and \((u_j, v_j)\) can have different distributions for \(i \not = j\).
In parametric settings, a key assumption is that the distributional form of \(F_{\varvec{\theta }}(y)\) is completely assumed for \(-\infty< y < \infty \). With this assumption, the parameters become identifiable through the conditional likelihood. This is different from the nonparametric setting, where the key assumptions are related to the support of U and V (Woodroofe 1985; Shen 2010; Moreira and de Uña-Álvarez 2010).
3.2 Randomized Newton–Raphson algorithm for the log-location-scale model
The proposed algorithm for the log-location-scale model is stated as follows:
Step 2 is performed until the algorithm converges. As initial values in Step 1, one may set \((\mu ^{(0)}, \sigma ^{(0)}) = \{\mathrm {Median}({\varvec{y}}), \mathrm {IQR}({\varvec{y}})/2\}\), where \(\mathrm {Median}({\varvec{y}})\) and \(\mathrm {IQR}({\varvec{y}})\) are the median and inter-quartile range of \({\varvec{y}}= (y_1, \ldots , y_n)\). An alternative are \(\mu ^{(0)} = \overline{{\varvec{y}}}\) and \(\sigma ^{(0)} = \{\sum _{i=1}^n (y_i - \overline{{\varvec{y}}})^2/(n-1)\}^{1/2}\). These initial values are consistent in the absence of double-truncation and under a symmetric error distribution.
Under the Weibull model, for which the error is asymmetric, we propose the following initial values. Let \(t_i, i=1 ,\ldots ,n\) be i.i.d. values such that \(\mathrm {P}(t_i > t) = \exp (-\lambda t^{\alpha })\). Menon (1963) showed that
are \(\sqrt{n}\)-consistent estimators for \((\alpha , \lambda )\) in the absence of truncation. By transformation, \( (\mu ^{(0)}, \sigma ^{(0)}) = \{(-\log \lambda ^{*}) / \alpha ^{*}, 1/\alpha ^{*}\}. \)
A very small value for \(\varPhi [\{v_i - \mu ^{(h)}\}/\sigma ^{(h)}] - \varPhi [\{u_i - \mu ^{(h)}\}/\sigma ^{(h)}]\) in Algorithm 1 may occur during the iteration. This causes \(H_{v_i}(\varvec{\theta }), H_{u_i}(\varvec{\theta }), K_{u_i}(\varvec{\theta })\) and \(K_{u_i}(\varvec{\theta })\) (see Appendix A for their definition) to be infinite, and hence produces computational problems for the Hessian matrix. This problem is handled by resetting the algorithm with new randomized initial values whenever \(\varPhi [\{v_i - \mu ^{(h)}\}/\sigma ^{(h)}] - \varPhi [\{u_i - \mu ^{(h)}\}/\sigma ^{(h)}]\) is small. The values \((d_1, d_2)\) are interpreted as the maximum noises to be added to the initial value \((\mu ^{(0)}, \sigma ^{(0)})\). In the analysis of real data, one may first try small values (e.g., \(D_1 = D_2 = d_1 = d_2 = 0.01\)). If Algorithm 1 does not converge, one may try larger values (\(D_1 = D_2 = d_1 = d_2 = 1\)).
3.3 Randomized Newton–Raphson algorithm for the AFT model
The proposed algorithm for the AFT model is stated as follows:
A single covariate (\(k=1\)) is the most relevant case in real applications of AFT models (Sect. 2). In this case, we suggest choosing the initial values of \(\varvec{\theta }^{(0)}\) using the least square estimator (LSE). Recall that the AFT model with \(k=1\) is
The LSE demands zero expectation and constant variance on the error term \(\sigma W\). In the Weibull AFT model, the expectation and variance of the error term are \(\mathrm {E}(\sigma W) = -0.5772 \sigma \) and \(\mathrm {Var}(\sigma W) = \sigma ^2 \pi ^2 / 6\), respectively. Following Cochran (1968), we add \(0.5772 \sigma \) to the error term,
where \(\epsilon ^{*} = \sigma W + 0.5772 \sigma \) has zero mean and \(\mathrm {Var}(\epsilon ^{*}) = \sigma ^2 \pi ^2 / 6\). The LSE minimizes
Consequently, we obtain the initial values
where \(y_i^{*} = y_i - \beta _1^{\mathrm {LSE}} x_i\) and \(\overline{{\varvec{x}}} = \sum _{i=1}^n x_i / n\).
3.4 Asymptotic distribution
The asymptotic distribution of the MLE is the basis for computing the standard error, confidence interval, and confidence band. Conditionally on \((u_i, v_i)\) and under independence of \(y_i\) on \(u_i\) and \(v_i\), a single observation \(y_i\) follows a density \(f_{\varvec{\theta }}(y | Y \in [u_i, v_i])\). As pointed out by Emura et al. (2017), \(y_i, i=1, 2, \ldots , n\), are independent but not identically distributed (i.n.i.d.) since \([u_i, v_i] \not = [u_j, v_j]\) for \(i \not = j\). In the following, we therefore apply the asymptotic theory for i.n.i.d. variables following Emura et al. (2017) to show that the asymptotic variance matrix based on the conditional likelihood is valid for obtaining the standard errors.
Let \(\varvec{\theta }^{0}\) be the true parameter values. We define the information matrix \({\varvec{I}}_i(\varvec{\theta })\) of the i-th observation as
Under Assumptions (A)–(E) given in “Appendix B”, one can apply the Lindeberg-Feller central limit theorem and the strong law of large numbers for the i.n.i.d. random variables to prove that
Note that in the above expression, \(\mathrm N(\cdot )\) denotes the multivariate normal distribution, whose dimension depends on the number of components of \(\varvec{\theta }\). In the log-location-scale model, \(\varvec{\theta }\) contains two components, while in the AFT model, \(\varvec{\theta }\) consists of \(k+2\) components. The proofs follow from the arguments given by Emura et al. (2017) who applied a Taylor expansion to approximate \(\widehat{\varvec{\theta }}- \varvec{\theta }^0\) as the sum of i.n.i.d. random variables.
Thus, the standard errors (SEs) are
where \([\cdot ]_{j,j}\) represents the j-th diagonal element.
4 Interval estimation
In this section, several types of interval estimation are developed, including the confidence interval (CI), confidence region (CR), and confidence band (CB).
By the asymptotic normal approximation to \(\widehat{\varvec{\theta }}_j\) in Section 3.4, we construct the \((1-\alpha )100\%\) CI for \(\varvec{\theta }_j: [\widehat{\varvec{\theta }}_j - Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\varvec{\theta }}_j), \widehat{\varvec{\theta }}_j + Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\varvec{\theta }}_j)]\), where \(Z_p\) is the pth upper quantile for \(\mathrm N(0, 1)\). For \(\sigma > 0\), we use the log-transformed \((1-\alpha )100\%\) CI for \(\sigma \): \([\widehat{\sigma }\exp \{-Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\sigma })/\widehat{\sigma }\}, \widehat{\sigma }\exp \{Z_{\alpha /2} \cdot \mathrm {SE}(\widehat{\sigma })/\widehat{\sigma }\}]\).
4.1 Wald-type confidence interval for \(F_{\varvec{\theta }}(t)\)
By the delta method, the SE of \(F_{\widehat{\varvec{\theta }}}(t)\) is
Under the log-location-scale model,
Under the AFT model, the SE of \(F_{\widehat{\varvec{\theta }}}(t | {\varvec{x}})\) is computed from
The \((1-\alpha )100\%\) CI for \(F_{\varvec{\theta }}(t)\) is \([F_{\widehat{\varvec{\theta }}}(t) - Z_{\alpha /2} \cdot \mathrm {SE}\{F_{\widehat{\varvec{\theta }}}(t)\}, F_{\widehat{\varvec{\theta }}}(t) + Z_{\alpha /2} \cdot \mathrm {SE}\{F_{\widehat{\varvec{\theta }}}(t)\}]\). This is the Wald-type CI for \(F_{\varvec{\theta }}(t)\). However, if the SE is large, this CI might not be within the range (0, 1). Instead of the obvious corrective method of pruning the CI to the known range (0, 1), we consider an appropriate transformation method in the following, which was particularly developed for CIs for probabilities (see Hong et al. 2008).
4.2 Transformed confidence interval for \(F_{\varvec{\theta }}(t)\)
For the location-scale model, define a matrix \(\varLambda ^{*}\) by
If \(1 - (Z_{\alpha /2} / \widehat{\sigma })^2 \lambda ^{*}_{22} > 0\), the transformed \(100(1-\alpha )\%\) CI for \(F_{\varvec{\theta }}(t)\) is
where \(a_{\min } = \widehat{\xi }+ G_1(\varLambda ^{*}) - G_2(\varLambda ^{*}), \ a_{\max } = \widehat{\xi }+ G_1(\varLambda ^{*}) + G_2(\varLambda ^{*})\),
\(\widehat{\xi }= (\log (t) - \widehat{\mu })/\widehat{\sigma }\) and \(\varOmega = Z_{\alpha /2} / \widehat{\sigma }\). The derivations of these formulas are given in “Appendix C”. If \(1 - (Z_{\alpha /2} / \widehat{\sigma })^2 \lambda ^{*}_{22} \le 0\), one cannot compute \(G_1(\varLambda ^{*})\). However, this unusual case only occurs for very small sample sizes.
4.3 Wald-type confidence region for \((\mu , \sigma )\)
For the log-location-scale model, the observed information can be written as
Define a matrix \(\varLambda \) by
An approximate \(100(1-\alpha )\%\) confidence region (CR) for \(\varvec{\theta }\) is given by the ellipsoid
where \(\chi ^2_{\mathrm {df} = 2}(p)\) is the pth upper quantile for the chi-squared distribution.
4.4 Confidence band for \(F_{\varvec{\theta }}(t)\)
If the CR satisfies the restriction \(i_{11}(i_{22} - \gamma _0) - i^2_{22} > 0\), the CB for \(F_{\varvec{\theta }}(t)\) is given as follows.
Theorem 1
Under the regularity conditions given in “Appendix B”, an approximate \(100(1-\alpha )\%\) CB for \(F_{\mu , \sigma }(t)\) is
where \(q_{\min } = w_{\widehat{p}} + h_2(\varLambda , \widehat{p}) - h_3(\varLambda , \widehat{p})\), \(q_{\max } = w_{\widehat{p}} + h_2(\varLambda , \widehat{p}) + h_3(\varLambda , \widehat{p})\),
\(\widehat{p} = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\), and \(w_{\widehat{p}} = \varPhi ^{-1}(\widehat{p}) = \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\).
Theorem 1 follows the results of Escobar et al. (2008) after some modifications. This method first derives the CB for the quantile function, and then transforms to \(F_{\mu , \sigma }(t)\). The detailed proofs are given in “Appendix D”. By construction, the CB is always wider than the transformed CI.
The CB can also be computed for the AFT model. Suppose we have the one-factor AFT model, \(y_i = \beta _0 + \beta _1 x_i + \sigma W_i, \ i=1, \ldots , n\). We originally have a 3-dimensional observed information matrix defined as
Define a function \(g(r_1, r_2, r_3) = (r_1 + r_2 x, r_3)^{\mathrm T}\) and \(\mu = \beta _0 + \beta _1 x\). By the delta method, the observed information matrix for \((\mu , \sigma )\) is
The CB for a given factor x can by obtained by replacing \({\varvec{i}}(\widehat{\varvec{\theta }})\) with \({\varvec{i}}(\widehat{\mu }, \widehat{\sigma })\) in Theorem 1.
5 Simulation
5.1 Simulation design
We randomly generate \((u_i, y_i, v_i)\), subject to \(u_i \le y_i \le v_i\). We generate the random vector (U, Y, V) with the joint density \(f_U(u) f_Y(y) f_V(v)\) subject to the inclusion criterion \(U \le Y \le V\).
To generate data from the Weibull model, we let \(Y = \mu + \sigma W\) where \(\mathrm {P}(W > w) = \exp \{-\exp (w)\}\). We also generate data for U and V from Weibull models. The triplets (U, Y, V) are independently generated by the inverse transformations
where \(P_1, P_2\) and \(P_3\) are i.i.d. U(0, 1) and \(\varDelta > 0\) is chosen to control the inclusion probability \(\mathrm {P}(U \le Y \le V)\). We set \(\mathrm {P}(U \le Y \le V) \approx 0.5\) for \((\mu =5, \sigma =2)\) and \(\mathrm {P}(U \le Y \le V) \approx 0.4\) for \((\mu = 5, \sigma =3)\) in the simulations.
For the one-factor Weibull AFT model \(Y_i = \log (T_i) = \beta _0 + \beta _1 x_i + \sigma W_i\), where \(\varPhi (w) = 1 - \exp \{-\exp (w)\}\), we consider three different levels \((x_i^{(1)} = 50, \ x_i^{(2)} = 100, \ x_i^{(3)} = 150)\). The Weibull AFT model can be written as \(Y_i^{(m)} = \beta _0 + \beta _1 x_i^{(m)} + \sigma W_i\), \(i=1,\ldots ,n\) and \(m=1, 2, 3\). The triplets \((U^{(m)}, Y^{(m)}, V^{(m)})\) are generated independently by using the inverse transformations
where \(m=1, 2, 3\) and \(P_1, P_2\) and \(P_3\) are i.i.d. U(0, 1) and \(\varDelta > 0\) is chosen to control the probability \(\mathrm {P}(U^{(m)} \le Y^{(m)} \le V^{(m)})\). The choice of \(\varDelta \) is independent of the level m since
We set equally sized samples for the three levels. For instance, for the sample size \(n = 75\), we allocate \(n/3 = 25\) observations for each level. We choose \(\varDelta \) such that \(\mathrm {P}(U \le Y \le V) = 0.5, 0.55\) and 0.6.
Once the data are generated, we compute parameter estimates and evaluate their accuracy in terms of the mean squared error (MSE). For instance, the MSE for estimating \(\sigma \) is defined as \(\mathrm {MSE}(\widehat{\sigma }) = \mathrm {E}(\widehat{\sigma }- \sigma )^2\). We also compute the average number of the NR iterations and the average number of randomizations to evaluate the convergence speed of Algorithms 1 and 2. We adopt similar designs for the lognormal models.
5.2 Simulation results
Table 1 shows the results for estimating \(\mu \) and \(\sigma \) with different initial values under the Weibull model. It reveals that the estimators are almost unbiased and the MSE decreases when increasing the sample size. For instance, under \(n=50\) and \((\mu = 5, \sigma = 2)\), we observe \(\mathrm {E}(\widehat{\mu }) = 4.996\) and \(\mathrm {E}(\widehat{\sigma }) = 2.010\). The average number of Newton–Raphson iterations is 5 to 8. Moreover, the initial value \(\{(-\log \lambda ^*)/\alpha ^*, 1/\alpha ^*\}\) performs better than \(\{\mathrm {Median}({\varvec{y}}), \mathrm {IQR}({\varvec{y}})/2\}\) in terms of the number of iterations and randomizations. For the latter initial values, about 1 to 3 randomizations are necessary to make the NR algorithm converge. Hereafter, we use the initial value \(\{(-\log \lambda ^*)/\alpha ^*, 1/\alpha ^*\}\).
Table 2 shows the results on interval estimation for \(\mu , \sigma \) and \(F_{\varvec{\theta }}(t)\), where t is chosen to be the \(25\%\) or \(75\%\) quantile. The sample standard deviations are close to the average SEs in all cases. The coverage probabilities for all true parameters are close to the nominal 95%, except for some under-coverage for the 25% point for \(F_{\varvec{\theta }}(t)\).
Table 3 reports the coverage probabilities of the transformed CI for \(F_{\varvec{\theta }}(t)\), where t is chosen to be the 25%, 50% or 75% quantile. At the nominal levels \(1-\alpha = 0.80, 0.85, 0.90, 0.95\), and 0.99, the coverage probabilities for \(F_{\varvec{\theta }}(t)\) are close to the nominal ones. Compared to the Wald-type CI, the transformed CI for \(F_{\varvec{\theta }}(t)\) gives better coverage probabilities.
Table 4 shows the coverage probabilities for the CR for \((\mu , \sigma )\) and the CB for \(F_{\varvec{\theta }}(t), t \ge 0\). The coverage probabilities for the CR and CB are closer to the nominal levels when the sample size increases. Since the CB is derived from the CR, their coverage probabilities are nearly equivalent. There are a few cases (in 1000 repetitions) that do not satisfy the condition \((1-z_{\alpha /2}/\widehat{\sigma }) \lambda _{22}^* > 0\). In these cases, the CB cannot be computed and is defined as (0, 1).
Table 5 shows the results for point estimation of \(\beta _0, \beta _1\) and \(\sigma \) with different initial values under the Weibull AFT model. The estimated parameters are almost unbiased for the true parameters, regardless of the choice of the initial values. The convergence for the initial values by the LSE is always faster than for the initial values by the true parameter values (see AI and AR in Table 5). For the latter initial values, a fairly large number of randomizations is necessary for small samples (\(n = 75\) and \(n = 150\)). This indicates that the LSE method is a reliable method to find the initial values. Hereafter, we only use the initial values by the LSE.
Table 6 summarizes the results on interval estimation for \(\beta _0, \beta _1\) and \(\sigma \). The sample standard deviations are close to the average SEs in all cases. The coverage probabilities for the true parameters are close to the nominal 95%.
Table 7 states the coverage probability for the CB for \(F_{\varvec{\theta }}(t | x=50)\). Overall, the coverage probabilities for the CB are close to the nominal levels. A systematic under-coverage is found for the small samples \(n = 75\) but the coverage performance is improved by increasing the sample size up to \(n = 450\).
Supplementary Material shows the simulation results under the lognormal distribution for T. The results are similar to these of Tables 1, 2, 3, 4, 5, 6, 7.
6 Data analysis
We fit the proposed models to the Equipment-S data (Ye and Tang 2016).
6.1 The Equipment-S data
The installation of a unit, called Equipment-S, began in 1977. If a unit fails, it is repaired by the maintenance department. However, the follow-up started at \(\tau _s = 1996\) after the maintenance department realized the importance of collecting the data. The data were collected until the study-end of \(\tau _e = 2012\). Hence a unit is available if its failure date F satisfies \(\tau _s \le F \le \tau _e\). The sample inclusion criterion is rewritten as \(\tau _s - B \le T \le \tau _e - B\), where B is the installation year (birth year) and \(T = F - B\) is the lifetime of the unit. Note that \(\tau _s - B \le T\) holds for a unit installed after the follow-up start \((\tau _s - B < 0\)), and hence, such units are not subject to left-truncation.
Figure 2 shows the data collection scheme. For instance, the 105-th unit has the installation year \(B_{105} = 1989.333\) and the failure year \(F_{105} = 2003.6\). Thus, the observation consists of the lifetime \(t_{105} = F_{105} - B_{105} = 14.267\), left-truncation limit \(\max (\tau _s - B_{105}, 0) = 6.7\), and right-truncation limit \(\tau _e - B_{105} = 22.7\). The 30-th unit has the installation year \(B_{30} = 2001.333\) and the failure year \(F_{30} = 2009.4\). Thus, the observation consists of the lifetime \(t_{30} = F_{30} - B_{30} = 8.067\), and right-truncation limit \(\tau _e - B_{30} = 10.667\) without left-truncation.
After log-transformations, the sample inclusion criterion is re-written as \(u_i \le y_i \le v_i\), where \(u_i \equiv \log \{\tau _s - B_i\}\) is the left-truncation limit (or \(u_i = -\infty \) in the absence of left-truncation), and \(v_i \equiv \log (\tau _e - B_i)\) is the right-truncation limit. Thus, the data consist of \((u_i, y_i, v_i)\) subject to \(u_i \le y_i \le v_i\) for \(i = 1, 2, \ldots , n\), where \(n=174\).
6.2 Numerical results
We fitted the Weibull, lognormal and Log-logistic models to the data \(\{(u_i, y_i, v_i); i=1, 2, \ldots , n\}\). For each model, we applied the NR algorithm (Algorithm 1) to obtain the MLE with the convergence criterion \(\epsilon = 10^{-5}\). The NR algorithm converged without randomization.
Table 8 shows the fitted results of the Weibull, the lognormal, and the Log-logistic models. All these MLEs indeed attained the global maximum of their log-likelihood functions (see the graphical displays given in Supplementary Material). The log-likelihood values indicate that the lognormal model provides the best fit. In every model, the cdf \(F(t) = \mathrm {P}(T \le t)\) was estimated by \(F_{\widehat{\varvec{\theta }}}(t) = \varPhi \{(\log (t) - \widehat{\mu })/\widehat{\sigma }\}\).
In addition to the parametric analyses, we conducted a nonparametric analysis. Figure 3 provides the NPMLE \(\widehat{F}^{\mathrm {NPMLE}}(t) = \sum _{t_i \le t} \widehat{f}_i\), where \(\widehat{f}_i, i=1, \ldots , n\) were computed by a self-consistency algorithm (Efron and Petrosian 1999). Figure 3 also displays the empirical cdf, \(\widehat{F}(t) = \sum _{i=1}^{174} \mathrm {I}(t_i \le t)/174\), where \(\mathrm {I}(\cdot )\) is the indicator function. The empirical cdf gives substantial upward bias relative to the NPMLE. This bias is due to the ignorance of double-truncation, in particular right-truncation.
The NPMLE \(\widehat{F}^{\mathrm {NPMLE}}(t)\) is regarded as an estimator of the truncated cdf \(F(t)/(F(\tau _2) - F(\tau _1))\), where \(\tau _1\) is the lower support for the left-truncation limit and \(\tau _2\) is the upper support for the right-truncation limit (Woodroofe 1985). We set \(\tau _1 = 0\) and \(\tau _2 = \exp (35)\), where the 35 (years) is chosen according to \(\max _i(\exp (v_i)) = 34.8\), and regard \(\widehat{F}^{\mathrm {NPMLE}}(t)\) as an estimator for \(F(t)/(F(\tau _2) - F(\tau _1))\). In general, \(\widehat{F}^{\mathrm {NPMLE}}(t)\) does not consistently estimate F(t) unless the identifiability assumptions are met (Woodroofe 1985).
Figure 3 compares the parametric and nonparametric approaches for estimating the truncated cdf \(F(t)/(F(\tau _2) - F(\tau _1))\). The figure shows that all three parametric approaches provide similar results. To select the best model among the three parametric models, we computed the Kolmogorov–Smirnov (K–S) statistic
and the Cramér–von Mises statistic
The lognormal model achieved the smallest C and K values (Table 8). Thus, we chose the lognormal model that also provided the largest likelihood value. The subsequent analyses are solely based on the lognormal model.
Figure 4 depicts the estimated cdf \(F_{\widehat{\varvec{\theta }}}(t) = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\) under the lognormal model. Notice that the cdf can be computed for values t greater than the largest observed lifetime \(\max (t_i) = 34.5\). For instance, we have \(F_{\widehat{\varvec{\theta }}}(35) = 0.45\). Hence, one can infer that more than half of the installed units can survive beyond 35 years (but those in the sample all failed within 35 years). This type of information can not be provided by the NPMLE that gives \(\widehat{F}^{\mathrm {NPMLE}}(34.5) = 1\).
Figure 4 depicts the estimated cdf along with the 95% CI and 95% CB. We observe that the width of the CB is larger than the width of the CI. The large widths of the CB and CI imply that the estimate \(F_{\widehat{\varvec{\theta }}}(t)\) has a very large variation; the phenomenon will be further discussed below.
6.3 Effect of double-truncation
We now examine how double-truncation influenced the estimates of lifetime distribution in the present data analysis. To this end, it is central to examine the sample inclusion probability
where \(\tau _0 = 1977\) is the start of installing the units, \(\tau _s = 1996\) is the start of collecting the data, and \(\tau _e = 2012\) is the end of collecting the data, \(F_B(b) = \mathrm {P}(B \le b)\) is the ’birth’ distribution of the units defined on \(b \in [\tau _0, \tau _e]\). The value c is estimable if \(F_B(b)\) is specified.
To estimate the birth distribution \(F_B(b)\), we treat B as the lifetime of interest subject to the double truncation criterion \(\tau _s - T \le B \le \tau _e - T\). Figure 5 depicts the NPMLE \(\widehat{F}_B^{\mathrm {NPMLE}}(b) = \sum _{B_i \le b} \widehat{f}_{B_i}\).
Under the assumption that the NPMLE can identify the birth distribution,
Thus, among the units installed between \(\tau _0 = 1977\) and \(\tau _e = 2012\), only 20% were included in the sample. Note that the last observed installation occurred at 2005.75 and therefore the NPMLE yields \(\widehat{F}_B(2005.75) = 1\), while the sampling continued until \(\tau _e = 2012\). Note also that the birth distribution is reasonably identified since the observed values of \(B_i\) are scattered over the support of \(F_B\) (see Fig. 5). The large widths of the CB and CI are possibly due to the small inclusion probability; see the information loss formula due to truncation (Emura and Konno 2012).
To confirm our conjecture, we analyzed the data by ignoring double-truncation; replacing \(u_i\) by a large negative value and replacing \(v_i\) by a large positive value. The estimated cdf shows that the CB and CI become substantially shorter (shown in Supplementary Material). Such results produce not only biased results but also highly optimistic account for statistical variation.
Another way for reducing the widths of the CB and CI is to increase the sample size. To see the required sample size, we conduct a simulation study. We assume that a pair (B, T) follows the estimated distribution \(\widehat{F}_B^{\mathrm {NPMLE}}\) and a lognormal distribution \(\mathrm {P}(T\le t) = F_{\widehat{\varvec{\theta }}}(t) = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }]\) that is the estimated distribution by the Equipment-S data. Following the same truncation scheme, we obtain simulated data \((u_i, y_i, v_i)\) subject to \(u_i \le y_i \le v_i\) for \(i=1, 2, \ldots , n\), where we set \(n=200, 1000\), and 5000.
Table 9 shows the simulation results based on a single dataset set of \(n=200, 1000\), or 5000. The estimates are reasonably close to the true values, and their SEs decrease as the sample sizes increase. Due to the small sample inclusion rates, the SEs are somewhat large for \(n=200\), confirming that the size \(n=174\) is not enough to obtain precise estimates for parameters.
Figure 6 depicts the 95% CI and 95% CB for \(n=200\) and 5000. As we expect, the widths of the CI (or CB) are very large for \(n=200\). The widths become reasonably small for \(n=5000\).
7 Concluding remarks
This article proposes the log-location-scale model and AFT model for analyzing doubly-truncated data. We developed a Newton–Raphson algorithm to obtain the MLE by a randomization scheme for ascertaining the initial values. We also considered interval estimation schemes, including the CI, CR, and CB. We have implemented some of the proposed methods in the R package ’double.truncation’ (Emura et al. 2018). For instance, one can use the PMLE.lognormal function to compute the MLE under the lognormal model.
The log-location-scale model includes commonly used models in reliability analysis, such as the Weibull, lognormal, and Log-logistic models. When the three models are fitted to the Equipment-S data, the lognormal model is chosen as the best model according to the likelihood as well as other distance measures. However, the performance of the model selection method remains to be studied. It is also relevant to develop a parametric bootstrap method to compute the p-value of goodness-of-fit tests.
Note that the technique of obtaining closed-form expressions for the CB for the cdf was developed under complete data (Cheng and Iles 1983) and censored data (Jeng and Meeker 2001). The CB derived in this article is regarded as an extension of these previous works to doubly-truncated data. If the NPMLE is used for estimating the cdf, the Kolmogorov–Smirnov or the Cramér-von-Mises type CB can be derived (Emura et al. 2015). However, these nonparametric methods are not suitable under parametric models.
The simulations show that the proposed initial values based on the moment-type methods and LSE-type method lead to quick and reliable convergence to the MLE, even though they are not consistent. On the other hand, the simulations also show that some initial values tend to diverge for small sample sizes, and hence, the randomization algorithm to search initial values plays a crucial role to guarantee the eventual convergence. An alternative is to apply the moment-based method of Frank and Dörre (2017) and Chapter 5 of Dörre and Emura (2019) in an attempt to find consistent initial estimates. However, the method involves nonparametric estimation of relevant quantities, which is not always consistent unless some support conditions are satisfied.
An important finding in our real data analysis is the large width of the CI and CB. This issue arises because doubly-truncated data lose a substantial amount of information by their sampling design, even though the sample size \(n=174\) appears to be large. If double-truncation is simply ignored, the width of the CI and CB becomes much smaller. However, such a naïve analysis is biased and improperly accounts for the sampling variation.
It would be of great interest to consider a Bayesian approach to analyze doubly-truncated data as in Dörre (2020). Their model imposes parametric assumptions on the birth process of units as well as parametric assumptions on lifetime. In addition, their MCMC type algorithm may stabilize the convergence by avoiding the maximization of the complex likelihood. These stronger parametric assumptions are one possible way to narrow down the large width of the CI and CB that were observed in our real data analysis. However, we expect that the expression of the CB would be fairly difficult to derive analytically. Hence, the proposed CB, which can be calculated analytically, would remain useful. Throughout the article, we have assumed the independence between truncation limits and the lifetime of interest. One possible topic for future work is to develop statistical methods by relaxing the independence assumption using copulas. Emura and Pan (2020) developed likelihood-based methods for dependent left-truncation under a bivariate copula model. To extend this approach to accommodate double-truncation, one may consider tri-variate copula models.
References
Cheng RCH, Iles TC (1983) Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25(1):77–86
Cochran WG (1968) Errors of measurement in statistics. Technometrics 10(4):637–666
Dörre A (2020) Bayesian estimation of a lifetime distribution under double truncation caused by time-restricted data collection. Stat Pap 61(3):945–965
Dörre A, Emura T (2019) Analysis of doubly truncated data: an introduction. Springer, Berlin
Efron B, Petrosian V (1999) Nonparametric methods for doubly truncated data. J Am Stat Assoc 94(447):824–834
Emura T, Konno Y (2012) Multivariate normal distribution approaches for dependently truncated data. Stat Pap 53(1):133–149
Emura T, Pan C-H (2020) Parametric likelihood inference and goodness-of-fit for dependently left-truncated data, a copula-based approach. Stat Pap 61(1):479–501
Emura T, Konno Y, Michimae H (2015) Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal 21(3):397–418
Emura T, Hu Y, Konno Y (2017) Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation. Stat Pap 58(3):877–909
Emura T, Hu Y, and Huang C (2018) Double.truncation: analysis of doubly-truncated data. CRAN
Escobar LA, Hong Y, Meeker WQ (2008) Simultaneous confidence bands and regions for log-location-scale distributions with censored data. Iowa State University Digital Repository, Paper, Statistical Preprints, p 32
Frank G, Dörre A (2017) Linear regression with randomly double-truncated data. S Afr Stat J 51(1):1–18
Härtler G (1987) Estimation and test for the parameters of the Arrhenius model. Qual Reliab Eng Int 3(4):219–225
Hoadley B (1971) Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann Math Stat 42(6):1977–1991
Hong Y, Meeker WQ, Escobar LA (2008) Avoiding problems with normal approximation confidence intervals for probabilities. Technometrics 50(1):64–68
Hu Y, Emura T (2015) Maximum likelihood estimation for a special exponential family under random double-truncation. Comput Stat 30(4):1199–1229
Jeng S, Meeker WQ (2001) Parametric simultaneous confidence bands for cumulative distributions from censored data. Technometrics 43(4):450–461
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Springer, New York
Lehmann EL, Casella G (1998) Theory of point estimation. Springer, Berlin
Menon MV (1963) Estimation of the shape and scale parameters of the Weibull distribution. Technometrics 5(2):175–182
Moreira C, de Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. J Nonparametr Stat 22:567–583
Moreira C, Van Keilegom I (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Comput Stat Data Anal 61:107–123
Moreira C, de Uña-Álvarez J, Van Keilegom I (2014) Goodness-of-fit Tests for a semiparametric model under random double truncation. Comput Stat 29(5):1365–1379
Nelson W (1982) Applied life data analysis. Addison-Wesley, Boston
Sankaran PG, Sunoj SM (2004) Identification of models using failure rate and mean residual life of double truncated random variables. Stat Pap 45(1):97–109
Scheike TH, Keiding N (2006) Design and analysis of time-to-pregnancy. Stat Methods Med Res 15:127–140
Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62(5):835–853
Shen PS (2011) Testing quasi-independence for doubly truncated data. J Nonparametr Stat 23(3):753–761
Woodroofe M (1985) Estimating a distribution function with truncaed data. Ann Stat 13(1):163–177
Ye Z, Tang L (2016) Augmenting the unreturned for field data with information on returned failures only. Technometrics 58(4):513–523
Acknowledgements
The authors thank the associate editor and two anonymous reviewers for their helpful comments that greatly improved this work. Takeshi Emura is financially supported by Ministry of Science and Technology, Taiwan (103-2118-M-008-MY2; 107-2118-M-008-003-MY3).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Derivatives of the log-likelihood function
To obtain he MLE numerically, we need the first and second derivatives of the log-likelihood \(\ell (\varvec{\theta })\). Define notations \(Z_{s}(\varvec{\theta }) = (s - \mu )/\sigma , s \in \{u_i, y_i, v_i\}\) and
Note that \(H_{v_i}(\varvec{\theta })\) is the backward hazard rate at time \(v_i\) and \(H_{u_i}(\varvec{\theta })\) is the forward hazard rate at time \(u_i\) under double-truncation (Sankaran and Sunoj 2004).
The first-order derivatives of the log-likelihood are
The second-order derivatives of the log-likelihood are
For the AFT model, we define \(Z_s(\varvec{\theta }) = (s - \beta _0 - {\varvec{x}}_i^{\mathrm T} \varvec{\beta })/\sigma , s \in \{u_i, y_i, v_i\}\) and define \(H_{v_i}(\varvec{\theta }), \ H_{u_i}(\varvec{\theta }), \ K_{v_i}(\varvec{\theta })\) and \(K_{u_i}(\varvec{\theta })\) by Equations (1) and (2).
The first-order derivatives of the log-likelihood are
The second-order derivatives of the log-likelihood are
Appendix B: Regularity conditions
We impose the following conditions to derive the asymptotic distribution of the MLEs.
Assumption (A) There exists a positive definite matrix \({\varvec{I}}(\varvec{\theta })\) such that, as \(n \rightarrow \infty \),
Assumption (B) The partial derivatives and integration can be exchangeable such that
Assumption (C) There is a measurable function \(M_{jsl}(\cdot )\) such that
with \(m_{i,jsl} \equiv \mathrm {E}_{\varvec{\theta }^0} \{ M_{jsl}(Y_i) \} < \infty \) and \(m_{i,jsl}^2 \equiv \mathrm {E}_{\varvec{\theta }^0} \{ M_{jsl}(Y_i)^2 \} < \infty \). For some \(m_{jsl}\) and \(m_{jsl}^2\), it holds that \(\sum _{i=1}^n m_{i,jsl}/n \rightarrow m_{jsl}\) and \(\sum _{i=1}^n m_{i,jsl}^2 /n \rightarrow m_{jsl}^2\) as \(n \rightarrow \infty \).
Assumption (D) There is a measurable function \(W_{js}(\cdot )\) such that
with \(w_{i,js} \equiv \mathrm {E}_{\varvec{\theta }^0} \{ W_{js}(Y_i) \} < \infty \) and \(w_{i,js}^2 \equiv \mathrm {E}_{\varvec{\theta }^0} \{ W_{js}(Y_i)^2 \} < \infty \). For some \(w_{js}\) and \(w_{js}^2\), it holds that \(\sum _{i=1}^n w_{i,js}/n \rightarrow w_{js}\) and \(\sum _{i=1}^n w_{i,js}^2 /n \rightarrow w_{js}^2\) as \(n \rightarrow \infty \).
Assumption (E) There is a measurable function \(A_j(\cdot )\) such that
with \(\sup _y A_j^2(y) < \infty \).
Assumption (A) requires the Fisher information matrix to be stable for large samples. Here, the matrix \({\varvec{I}}(\varvec{\theta })\) is regarded as the asymptotic information matrix. Assumptions (B)–(C) impose the smoothness and boundedness of \(f_{\varvec{\theta }}(y | Y \in [u_i, v_i])\), which are similar to those imposed for the i.i.d. models (pp. 462–463 of Lehmann and Casella 1998). Assumptions (D)–(E) require the boundedness of the score functions and Hessian matrix, as employed by Emura et al. (2017) for the i.n.i.d. models. While these assumptions are too strong without truncation (Hoadley 1971), they can be satisfied under double-truncation since the density is truncated from below and above. See Lemma 4 of Emura et al. (2017).
Appendix C: The derivation of the transformed CI for \(F_{\varvec{\theta }}(t)\)
The derivation is based on inverting the CI for the quantile. Define the pth quantile function \(g(\varvec{\theta }) = y_p = \log (t_p) = \mu + \sigma w_p\), where \(w_p = \varPhi ^{-1}(p)\) for \(0 \le p \le 1\). The SE of \(\widehat{y}_p = \widehat{\mu }+ \widehat{\sigma }w_p\) is
The \((1-\alpha ) 100\%\) CI for the pth quantile is \([\widehat{y}_p(\min ), \widehat{y}_p(\max )]\), where
Let \(w_p = a, \varOmega = Z_{\alpha /2}/\widehat{\sigma }\) and \(\widehat{\xi }= \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\). Inverting the CI for the quantile means that (1) the solution to \(\widehat{y}_p(\min ) = \log (t)\) with respect to p gives the upper CI for \(F_{\varvec{\theta }}(t)\) and (2) the solution to \(\widehat{y}_p(\max ) = \log (t)\) with respect to p gives the lower CI for \(F_{\varvec{\theta }}(t)\). Thus we need to solve
Manipulating these equations, we have
The solutions to the above equation are
where
Therefore, the transformed CI for \(F_{\varvec{\theta }}(t)\) is \([\varPhi (a_{\min }), \varPhi (a_{\max })]\) with the restriction \(1 - \varOmega ^2 \lambda ^*_{22} > 0\).
Appendix D: The derivation of the CB for \(F_{\varvec{\theta }}(t)\)
Here we assume the regularity conditions given in “Appendix C”. For the location-scale model, the observed information matrix can be written as
Define the scaled information matrix as \({\varvec{i}}_S(\widehat{\varvec{\theta }}) = \begin{bmatrix} i_{11} &{} i_{12} \\ i_{12} &{} i_{22} \end{bmatrix}\), and the inverse of the scaled information matrix \(\varLambda = \begin{bmatrix} i_{11} &{} i_{12} \\ i_{12} &{} i_{22} \end{bmatrix}^{-1} = \begin{bmatrix} \lambda _{11} &{} \lambda _{12} \\ \lambda _{12} &{} \lambda _{22} \end{bmatrix}\). Letting \(\varvec{\delta }^{\mathrm T} = (0, 1)\), we find \(\min \varvec{\delta }^{\mathrm T} \varvec{\theta }\) for the constrained region \((\widehat{\varvec{\theta }}- \varvec{\theta }) {\varvec{i}}(\widehat{\varvec{\theta }}) (\widehat{\varvec{\theta }}- \varvec{\theta }) \le \gamma \) such that
subject to the constraint \((\widehat{\varvec{\theta }}- \varvec{\theta })^{\mathrm T} {\varvec{i}}_S(\widehat{\varvec{\theta }}) (\widehat{\varvec{\theta }}- \varvec{\theta }) \le \widehat{\sigma }^2 (\gamma /n) \equiv \widehat{\sigma }^2 \gamma _0\).
Let \({\varvec{d}}^{\mathrm T} = \left( \sqrt{{\varvec{i}}_S^{-1}(\widehat{\varvec{\theta }})} \varvec{\delta }\right) ^{\mathrm T}\) and \({\varvec{k}}= \sqrt{{\varvec{i}}_S(\widehat{\varvec{\theta }})} (\varvec{\theta }- \widehat{\varvec{\theta }})\), we then have \({\varvec{k}}^{\mathrm T} {\varvec{k}}= \gamma _0 \widehat{\sigma }^2\). From the Cauchy-Schwartz inequality, \({\varvec{d}}^{\mathrm T} {\varvec{k}}\le \sqrt{{\varvec{d}}^{\mathrm T} {\varvec{d}}} \sqrt{{\varvec{k}}^{\mathrm T} {\varvec{k}}} \le \gamma _0 \widehat{\sigma }^2\). The equality holds when \({\varvec{k}}= a {\varvec{d}}\), where a is constant. Since we have \((a {\varvec{d}})^{\mathrm T} (a {\varvec{d}}) = \gamma _0 \widehat{\sigma }^2\), the minimum is attained when \(a = -\sqrt{\gamma _0 \widehat{\sigma }^2 / {\varvec{d}}^{\mathrm T} {\varvec{d}}}\) at
This minimum is attained when
and
This gives the restriction \((1 - \sqrt{\gamma _0 \lambda _{22}}) > 0\), that is, \(i_{11}(i_{22} - \gamma _0) - i_{12}^2 > 0\). Similarly, under \({\varvec{c}}^{\mathrm T} = (1, w_p)\) for the pth quantile,
The maximum and minimum hold when
The CBs for the quantile function are
The CBs for \(F_{\varvec{\theta }}(t)\) are derived by inverting the CB for the quantile function, that is,
where \(\widehat{\xi }= (y - \widehat{\mu })/\widehat{\sigma }= \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\) and \(w_p = q\). This equation yields the solutions
where \(\widehat{p} = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }], w_{\widehat{p}} = \varPhi {-1}(\widehat{p}) = \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\),
Therefore, the CB for \(F_{\varvec{\theta }}(t)\) is \([\varPhi (q_{\min }), \varPhi (q_{\max })]\).
Rights and permissions
About this article
Cite this article
Dörre, A., Huang, CY., Tseng, YK. et al. Likelihood-based analysis of doubly-truncated data under the location-scale and AFT model. Comput Stat 36, 375–408 (2021). https://doi.org/10.1007/s00180-020-01027-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01027-6