Estimation of extremes for Weibull-tail distributions in the presence of random censoring

Worms, Julien; Worms, Rym

doi:10.1007/s10687-019-00354-2

Estimation of extremes for Weibull-tail distributions in the presence of random censoring

Published: 22 June 2019

Volume 22, pages 667–704, (2019)
Cite this article

Extremes Aims and scope Submit manuscript

Julien Worms¹ &
Rym Worms²

113 Accesses
7 Citations
Explore all metrics

Abstract

The Weibull-tail class of distributions is a sub-class of the Gumbel extreme domain of attraction, and it has caught the attention of a number of researchers in the last decade, particularly concerning the estimation of the so-called Weibull-tail coefficient. In this paper, we propose an estimator of this Weibull-tail coefficient when the Weibull-tail distribution of interest is censored from the right by another Weibull-tail distribution: to the best of our knowledge, this is the first one proposed in this context. A corresponding estimator of extreme quantiles is also proposed. In both mild censoring and heavy censoring (in the tail) settings, asymptotic normality of these estimators is proved, and their finite sample behavior is presented via some simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New classes of tests for the Weibull distribution using Stein’s method in the presence of random right censoring

Article 09 January 2022

Tail fitting for truncated and non-truncated Pareto-type distributions

Article 21 March 2016

Local Robust Estimation of Pareto-Type Tails with Random Right Censoring

Article 04 June 2019

References

Brahimi, B., Meraghni, D., Necir, A.: Approximations to the tail index estimator of a heavy-tailed distribution under random censoring and application. Math. Methods Statist. 24, 266–279 (2015)
Article MathSciNet MATH Google Scholar
Brahimi, B., Meraghni, D., Necir, A.: Nelson-Aalen tail product-limit process and extreme value index estimation under random censorship. Unpublished manuscript, available on the ArXiv archive : arXiv:1502.03955v2 (2016)
Brahimi, B., Meraghni, D., Necir, A., Soltane, L.: Tail empirical process and a weighted extreme value index estimator for randomly right-censored data. Unpublished manuscript, available on the ArXiv archive : arXiv:1801.00572(2018)
Beirlant, J., Dierckx, G., Guillou, A., Fils-Villetard, A.: Estimation of the extreme value index and extreme quantiles under random censoring. Extremes 10, 151–174 (2007)
Article MathSciNet MATH Google Scholar
Beirlant, J., Broniatowski, M., Teugels, J., Vynckier, P.: The mean residual life function at great age : applications to tail estimation. Journal of Statistical Planning and Inference 45, 21–48 (1995)
Article MathSciNet MATH Google Scholar
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.: Statistics of extremes: theory and applications. Wiley (2004)
Beirlant, J., Guillou, A., Toulemonde, G.: Peaks-over-threshold modeling under random censoring. Communications in Statistics - Theory and Methods 39, 1158–1179 (2010)
Article MathSciNet MATH Google Scholar
Beirlant, J., Bardoutsos, A., de Wet, T., Gijbels, I.: Bias reduced tail estimation for censored Pareto type distributions. Stat. Prob. Lett. 109, 78–88 (2016)
Article MathSciNet MATH Google Scholar
Beirlant, J., Maribe, G., Verster, A.: Penalized bias reduction in extreme value estimation for censored Pareto-type data, and long-tailed insurance applications. Insurance Math. Econom. 78, 114–122 (2018)
Article MathSciNet MATH Google Scholar
Beirlant, J., Worms, J., Worms, R.: Asymptotic distribution for an extreme value index estimator in a censorship framework. Journal of Statistical Planning and Inference 202, 31–56 (2019)
Article MathSciNet MATH Google Scholar
Bingham, N.H., Goldie, C.M., Teugels, J.L.: Regular variation. Cambridge University Press, Cambridge (1987)
Book MATH Google Scholar
Csorgo, S.: Probability theory. Independence, interchangeability, martingales. Ann. Stat. 24(6), 2744–2778 (1996)
Article Google Scholar
de Haan, L., Ferreira, A.: Extreme value theory : an introduction springer science + business media (2006)
Diebolt, J., Gardes, L., Girard, S., Guillou, A.: Bias-reduced Estimators of the Weibull tail-Coefficient. Test 17, 311–331 (2008)
Article MathSciNet MATH Google Scholar
Dierckx, G., Beirlant, J., De Waal, D., Guillou, A.: A new estimation method for Weibull-type tails based on the mean excess function. Journal of Statistical Planning and Inference 139, 1905–1920 (2009)
Article MathSciNet MATH Google Scholar
Einmahl, J., Fils-Villetard, A., Guillou, A.: Statistics of Extremes under Random Censoring. Bernoulli 14, 207–227 (2008)
Article MathSciNet MATH Google Scholar
Gardes, L., Girard, S.: Estimating extreme quantiles of Weibull-tail distributions. Communications in Statistics : Theory and Methods 34, 1065–1080 (2005)
Article MathSciNet MATH Google Scholar
Girard, S.: A Hill type estimator of the Weibull-tail coefficient. Communications in Statistics : Theory and Methods 33(2), 205–234 (2004a)
Article MathSciNet MATH Google Scholar
Girard, S.: A Hill type estimator of the Weibull-tail coefficient. HAL archive version : hal-00724602 (2004b)
Goegebeur, Y., Guillou, A.: Goodness-of-fit testing for Weibull-type behavior. Journal of Statistical Planning and Inference 140, 1417–1436 (2010)
Article MathSciNet MATH Google Scholar
Goegebeur, Y., Beirlant, J., de Wet, T.: Generalized kernel estimators for the Weibull-Tail coefficient. Communications in Statistics : Theory and Methods 39, 3695–3716 (2010)
Article MathSciNet MATH Google Scholar
Gomes, M.I., Neves, M.M.: Estimation of the extreme value index for randomly censored data. Biotechnol. Lett. 48(1), 1–22 (2011)
Google Scholar
Klein, J.P., Moeschberger, M.L.: Data sets for survival analysis - techniques for censored and truncated data. Springer Second Edition (2005)
Ndao, P., Diop, A., Dupuy, J.-F.: Nonparametric estimation of the conditional tail index and extreme quantiles under random censoring. Comput. Stat. Data Anal. 79, 63–79 (2014)
Article MathSciNet MATH Google Scholar
Ndao, P., Diop, A., Dupuy, J.-F.: Nonparametric estimation of the conditional extreme-value index with random covariates and censoring. Journal of Statistical Planning and Inference 168, 20–37 (2016)
Article MathSciNet MATH Google Scholar
Reiss, R.: Approximate distribution of order statistics. Springer-Verlag (1989)
Reynkens, T., Verbelen, R., Beirlant, J., Antonio, K.: Modelling censored losses using splicing: a global fit strategy with mixed Erlang and extreme value distributions. Insurance Math. Econom. 77, 65–77 (2017)
Article MathSciNet MATH Google Scholar
Sayah, A., Yahia, D., Brahimi, B.: On robust tail index estimation under random censorship. Afrika Statistika 9, 671–683 (2014)
Article MathSciNet MATH Google Scholar
Stupfler, G.: Estimating the conditional extreme-value index in presence of random right-censoring. J. Multivar. Anal. 144, 1–24 (2016)
Article MATH Google Scholar
Stupfler, G.: On the study of extremes with dependent random right-censoring. Extremes 22, 97–129 (2019)
Article MathSciNet MATH Google Scholar
Worms, J., Worms, R.: New estimators of the extreme value index under random right censoring, for heavy-tailed distributions. Extremes 17(2), 337–358 (2014)
Article MathSciNet MATH Google Scholar
Worms, J., Worms, R.: Moment estimators of the extreme value index for randomly censored data in the Weibull domain of attraction. Unpublished manuscript, available on the ArXiv archive, arXiv:1506.03765 (2015)
Worms, J., Worms, R.: Extreme value statistics for censored data with heavy tails under competing risks. Metrika 81(7), 849–889 (2018)
Article MathSciNet MATH Google Scholar
Zhou, M.: Some properties of the Kaplan-Meier estimator for independent non identically distributed random variables. Ann. Statist. 19(4), 2266–2274 (1991)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Université Paris-Saclay/Université de Versailles-Saint-Quentin-En-Yvelines, Laboratoire de Mathématiques de Versailles (CNRS UMR 8100), F-78035, Versailles Cedex, France
Julien Worms
Université Paris-Est, Laboratoire d’Analyse et de Mathématiques Appliquées (CNRS UMR 8050), UPEC, F-94010, Créteil, France
Rym Worms

Authors

Julien Worms
View author publications
You can also search for this author in PubMed Google Scholar
Rym Worms
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rym Worms.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Let us here first summarize the contents of the Appendix. It is composed of 3 main parts.

Part A contains the proof of Theorem 1 : after showing that the statistic Δ_n (defined in formula (A.3)) is the main contributor to the behavior of $\hat \theta _{X,k}$, three propositions are then stated, and proved. Two important lemmas are also stated in the proof of the first and main proposition (which describes the asymptotic distribution of Δ_n) : the first one (Lemma 1) is concerned with all the “remainder” terms, and the second one (Lemma 2) is concerned with the asymptotic distribution of the proportion $\hat p_{k}$ of uncensored observations in the tail, depending on the position of 𝜃_X with respect to 𝜃_C. These 2 lemmas are proved in parts C.2 and C.3 of the Appendix.

Part B is then devoted to the proof of Theorem 2.

Part C finally contains other lemmas which are repeatedly useful in the first two parts. In Appendix C.1, the important Lemmas 3 and 4 describe sharp second order properties of the different slowly varying functions which are handled in this work, and of the theoretical probability function p(⋅) of being uncensored in the tail. In Appendix C.4, the useful Lemmas 5, 6 and 7 are stated (they are issued from the literature, but are provided for ease of reference).

1.1 Appendix A. Proof of Theorem 1

Remind that

$$ \hat{\theta}_{X,k} = \frac{ \frac{1}{k} {\sum}_{j=1}^{k} \left( \log Z_{n-j+1,n} - \log Z_{n-k,n} \right)} { \frac{1}{k} {\sum}_{j=1}^{k} \left( \log \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - \log \hat{{\Lambda}0}_{nF}(Z_{n-k,n}) \right)}. $$

Introducing E₁,…, E_nn independent standard exponential random variables, such that $Z_{i}={\Lambda }^{-}_{H}(E_{i})$, we have, since ${\Lambda }^{-}_{H}(x)= x^{\theta _{Z}} l(x)$ and ${\Lambda }_{F} \circ {\Lambda }^{-}_{H} (x)= x^{a} \tilde {l}(x)$ with l and $\tilde {l}$ slowly varying at infinity,

$$ \begin{array}{@{}rcl@{}} \log Z_{n-j+1,n} - \log Z_{n-k,n} & =& \theta_{Z} \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right) + \log \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} \right) \end{array} $$

(A.1)

$$ \begin{array}{@{}rcl@{}} \log {\Lambda}_{F}(Z_{n-j+1,n}) - \log {\Lambda}_{F}(Z_{n-k,n}) & =& a \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right) + \log \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} \right), \end{array} $$

(A.2)

Now, let

$$ M_{n}= \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right), $$

and

$$ {\Delta}_n= \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda} _F(Z_{n-j+1,n})} \frac{ {\Lambda} _F(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \right). $$

(A.3)

Since the denominator in the expression for $\hat \theta _{X,k}$ above equals

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{k} \sum\limits_{j=1}^{k} \left( \log \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - \log \hat{{\Lambda} }_{nF}(Z_{n-k,n}) \right) \\&&= \frac{1}{k} \sum\limits_{j=1}^{k} \log {\Lambda}_{F}(Z_{n-j+1,n}) - \log {\Lambda} _{F}(Z_{n-k,n}) + {\Delta}_{n}, \end{array} $$

we obtain, using (A.1), (A.2) and relation 𝜃_X = 𝜃_Z/a,

$$ \begin{array}{@{}rcl@{}} \hat{\theta}_{X,k} -\theta_{X} & = & \displaystyle \frac{\theta_{Z} M_{n} + R_{n,l}}{a M_{n} + R_{n,\tilde{l}} +{\Delta}_{n} } -\theta_{X}\\ & =& \displaystyle \theta_{X} \frac{\theta^{-1}_{X} R_{n,l} - R_{n,\tilde{l}} - {\Delta}_{n}}{a M_{n} + R_{n,\tilde{l}} +{\Delta}_{n}}\\ &= & \displaystyle -\frac{\theta_{X}}{a} {\Delta}_{n} \left( M_{n} + a^{-1} R_{n,\tilde{l}} +a^{-1} {\Delta}_{n} \right)^{-1} + \frac{R_{n,l} - \theta_{X} R_{n,\tilde{l}}}{a M_{n} + R_{n,\tilde{l}} +{\Delta}_{n}}, \end{array} $$

where

$$ R_{n,l} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} \right) \text{and} R_{n,\tilde{l}} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} \right). $$

(A.4)

We thus have the following representation, which shows that the behavior of the estimation error is essentially based on the behavior of the statistic Δ_n :

$$ \sqrt{k}L_{nk}^{-b}\left( \hat{\theta}_{X,k} -\theta_{X}\right) = \left( -\frac{\theta_{X}}{a}\right) \sqrt{k} L_{nk}^{1-b} {\Delta}_{n} D_{n}^{-1} + \left( \sqrt{k} L_{nk}^{1-b} R_{n,l} - \theta_{X} \sqrt{k} L_{nk}^{1-b} R_{n,\tilde{l}} \right) (aD_{n})^{-1} $$

where the denominator $D_{n}=L_{nk} M_{n} + a^{-1}L_{nk} R_{n,\tilde {l}} + a^{-1}L_{nk} {\Delta }_{n}$ will turn out to converge to 1. It is now clear that the proof of Theorem 1 then follows from the combination of the following three propositions, the first one being the most important and the longest to establish. These propositions are proved in the next three subsections.

Proposition 1

Under the conditions of Theorem 1 we have, as n tends to infinity,

$$ \begin{array}{@{}rcl@{}} {\Delta}_n \overset{d}{=} \frac{1+o_{\mathbb{P}}(1)}{L_{nk}} \left( \left( L_{nk}^{1-a} \frac{\hat{p}_k}{\tilde{c}} -a \right) - a \left( \bar{E}_n -1 \right) \right) - k^{-1/2} L_{nk}^{b-1}\tilde{\alpha}\left( 1+\frac{1}{\rho}\right) (1+o_{\mathbb{P}}(1)) ~~~~ \end{array} $$

(A.5)

and

$$ \sqrt{k} L_{nk}^{1-b} {\Delta}_{n} \overset{d}{\longrightarrow} N\left( m_{\Delta},\frac{a}{\tilde{c}}\right), $$

where$\bar E_{n}= \frac {1}{k} {\sum }_{i=1}^{k} E_{i}$ (sample mean of standard exponential variables), and

$$ \hat{p}_{k} := \frac{1}{k} \sum\limits_{j=1}^{k} \delta_{n-j+1,n} \quad \text{and}\quad m_{\Delta} = \left\{ \begin{array}{ll} \displaystyle - \tilde{\alpha} \left( 1+\frac{1}{\rho}\right) -\frac{\theta_{X}}{\theta_{C}} \frac{c_{G}}{{c_{F}^{d}}} \alpha^{\prime} & \text{ if } \theta_{X} < \theta_{C}, \\ 0 & \text{ if } \theta_{X} \geqslant \theta_{C} . \end{array} \right. $$

Please note that the exponential variables E_i appearing in the statement of Proposition 1 above are not the same as those introduced at the beginning of this Section.

Proposition 2

Under the conditions of Theorem 1 we have, as n tends to infinity,

$$ \sqrt{k} L_{nk}^{1-b} R_{n,l} \overset{\mathbb{P}}{\longrightarrow} \left\{ \begin{array}{ll} \alpha & \text{ if } \theta_{X} < \theta_{C} , \\ 0 & \text{ if } \theta_{X} \geqslant \theta_{C}, \end{array} \right. \quad {and} \quad \sqrt{k} L_{nk}^{1-b} R_{n,\tilde{l}} \overset{\mathbb{P}}{\longrightarrow} \left\{ \begin{array}{ll} \tilde{\alpha} & \text{ if } \theta_{X} < \theta_{C} , \\ 0 & \text{ if } \theta_{X} \geqslant \theta_{C} . \end{array} \right. $$

Proposition 3

Under conditionH₁, we have$ L_{nk} M_{n} \overset {\mathbb {P}}{\longrightarrow } 1 $, asn tends to infinity.

Remark 1

First, remind that a = 1 and $\tilde c=1$ when 𝜃_X < 𝜃_C. Let us highlight that the convergence in distribution of $\sqrt {k} L_{nk}^{1-b}{\Delta }_{n}$ stated in Proposition 1 comes from the confrontation between the two terms appearing in the representation (A.5) of Δ_n : the term in $\hat {p}_{k}$ and the term involving the exponential sample mean. The convergence in distribution of the term involving $\hat {p}_{k}$ is detailed in Lemma 2 in Subsection Appendix A.1; this will be the leading term only when 𝜃_X > 𝜃_C (in this setting, the constant b is positive and thus the exponential term vanishes). When 𝜃_X < 𝜃_C, it will only generate a possible bias, and when 𝜃_X = 𝜃_C it participates to the asymptotic normality along with the exponential term.

The following corollary is then stated, concerning the statistic RL_n defined in Eq. 9 and discussed thereafter. Note that this corollary probably holds under weaker conditions.

Corollary 1

Under the conditions of Theorem 1, asn →∞, we have$RL_{n} \overset {\mathbb {P}}{\longrightarrow } a$.

Its proof is short, so we will provide it here. With the same notations as in the previous page, we have readily

$$ RL_{n} = \left( \frac{1}{k} \sum\limits_{j=1}^{k} \log\log(n/j) -\log\log(n/k) \right)^{-1} \frac 1{L_{nk}} (aL_{nk} M_{n} + L_{nk} R_{n,\tilde{l}} + L_{nk}{\Delta}_{n} ), $$

where the mean inside the large brackets is equivalent to 1/L_nk (see Girard2004b formula (15), for a proof). The proof of Corollary 1 thus follows from Propositions 1, 2 and 3.

1.1.1 A.1. Proof of Proposition 1

Starting from the definition of Δ_n in Eq. A.3, we introduce the first remainder term $ R_{1,k}^{({\Delta })}$ by writing

$$ \begin{array}{@{}rcl@{}} {\Delta}_{n} & = & \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \right)\\ & = & \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} - 1\right) + R_{1,k}^{({\Delta})}. \end{array} $$

Now, using the definition of $\hat {{\Lambda } }_{nF}$ in Eq. 4, we obtain

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{k} \sum\limits_{j=1}^{k} \left( \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - \hat{{\Lambda} }_{nF}(Z_{n-k,n}) \right) \\&&\quad= \frac{1}{k} \sum\limits_{j=1}^{k} \sum\limits_{i=j}^{k} \frac{\delta_{n-j+1,n}}{j}= \frac{1}{k} \sum\limits_{j=1}^{k} \delta_{n-j+1,n} = \hat{p}_{k}. \end{array} $$

Hence, it can easily be checked that

$$ \begin{array}{@{}rcl@{}} \displaystyle \frac{ \hat{{\Lambda} }_{nF}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-k,n})} \left( {\Delta}_{n} - R_{1,k}^{({\Delta})} \right) = \displaystyle \frac{ \hat{p}_{k}}{{\Lambda}_{F}(Z_{n-k,n})} - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{{\Lambda}_{F}(Z_{n-j+1,n})}{{\Lambda}_{F}(Z_{n-k,n})} -1 \right) + R_{2,k}^{({\Delta})}, \end{array} $$

where

$$ R_{2,k}^{({\Delta})} = \frac{1}{{\Lambda}_{F}(Z_{n-k,n})} \frac{1}{k} \sum\limits_{j=1}^{k} \left( \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - {\Lambda}_{F}(Z_{n-j+1,n}) \right) \left( \frac{{\Lambda}_{F}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-j+1,n})} -1 \right). $$

Since, ∀ 1 ≤ j ≤ k + 1, ${\Lambda }_{F}(Z_{n-j+1,n}) = ({\Lambda }_{F} \circ {\Lambda }^{-}_{H}) (E_{n-j+1,n}) = E_{n-j+1,n}^{a} \tilde {l}(E_{n-j+1,n})$, where $\tilde {l}$ is slowly varying and tends to $\tilde {c}$ at infinity (cf Lemma 3 in Appendix C.1), then

$$ \begin{array}{@{}rcl@{}} \frac{{\Lambda}_{F}(Z_{n-j+1,n})}{{\Lambda}_{F}(Z_{n-k,n})} -1 &=& \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1 = \left( \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} -1 \right) \\&&+ \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1 \right), \end{array} $$

and, introducing $(\tilde {E}_{1}, \ldots , \tilde {E}_{k})$k independent standard exponential random variable such that, according to Lemma 5, $(E_{n-j+1,n}-E_{n-k,n})_{1 \leqslant j \leqslant k} \overset {d}{=} (\tilde {E}_{k,k}, {\ldots } , \tilde {E}_{1,k})$, we can write

$$ \frac{ \hat{{\Lambda} }_{nF}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-k,n})} \left( {\Delta}_{n} - R_{1,k}^{({\Delta})} \right) \overset{d}{=} \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}^{a}} + R_{3,k}^{({\Delta})} - \frac{1}{k} \sum\limits_{j=1}^{k} \left( a \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right) + R_{4,k}^{({\Delta})} + R_{5,k}^{({\Delta})} + R_{2,k}^{({\Delta})}, $$

where

$$ \begin{array}{@{}rcl@{}} R_{3,k}^{({\Delta})} & = & \displaystyle \frac{ \hat{p}_{k}}{E_{n-k,n}^{a}} \left( \frac{1}{\tilde{l}(E_{n-k,n})} - \frac{1}{\tilde{c}} \right)\\ R_{4,k}^{({\Delta})} & = & \displaystyle - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1\right)\\ R_{5,k}^{({\Delta})} & = & \displaystyle - \frac{1}{k} \sum\limits_{j=1}^{k} \left\{ \left( \left( 1+ \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right)^{a}-1 \right) - a \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right\}. \end{array} $$

Let us summarize :

$$ {\Delta}_{n} \overset{d}{=} \frac{{\Lambda}_{F}(Z_{n-k,n})}{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \left( \left( \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}^{a}} - \frac{a}{E_{n-k,n}} \frac{1}{k} \sum\limits_{j=1}^{k} \tilde{E}_{j} \right) + \sum\limits_{i=2}^{5} R_{i,k}^{({\Delta})} \right) + R_{1,k}^{({\Delta})}. $$

But

$$ \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}^{a}} - \frac{a}{E_{n-k,n}} \frac{1}{k} \sum\limits_{j=1}^{k} \tilde{E}_{j} = \frac{1}{E_{n-k,n}} \left( \left( L_{nk}^{1-a} \frac{\hat{p}_{k}}{\tilde{c}} -a \right) - a \left( \bar{E}_{n} -1 \right) \right) + R_{6,k}^{({\Delta})}, $$

where $ \bar {E}_{n} = \frac {1}{k} {\sum }_{j=1}^{k} \tilde {E}_{j}$ and

$$ R_{6,k}^{({\Delta})}= \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}} \left( E_{n-k,n}^{1-a} - L_{nk}^{1-a} \right). $$

Finally,

$$ \begin{array}{@{}rcl@{}} {\Delta}_n \overset{d}{=} \frac{{\Lambda} _F(Z_{n-k,n})}{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \left( \frac{1}{E_{n-k,n}} \left( \left( L_{nk}^{1-a} \frac{\hat{p}_k}{\tilde{c}} -a \right) - a \left( \bar{E}_n -1 \right) \right) + \sum\limits_{i=2}^6 R_{i,k}^{({\Delta})} \right) + R_{1,k}^{({\Delta})}. \end{array} $$

(A.6)

The following lemma, proved in Appendix C.2 shows that that $\sqrt {k} L_{nk}^{1-b} {\sum }_{i=1}^{6} R_{i,k}^{({\Delta })}$ tends to a constant.

Lemma 1

Under the assumptions of Theorem 1, as n tends to infinity,

$$ \sqrt{k} L_{nk}^{1-b} R_{i,k}^{({\Delta})} \overset{\mathbb{P}}{\longrightarrow} 0, \text{ for } j \in \{ 1,2,5,6 \} $$

$$ \sqrt{k} L_{nk}^{1-b} R_{3,k}^{({\Delta})} \overset{\mathbb{P}}{\longrightarrow} -\frac{\tilde{\alpha}}{\rho} \text{ if } \theta_{X} < \theta_{C} \text{ and } 0 \text{ if } \theta_{X} \geqslant \theta_{C}. $$

$$ \sqrt{k} L_{nk}^{1-b} R_{4,k}^{({\Delta})} \overset{\mathbb{P}}{\longrightarrow} -\tilde{\alpha} \text{ if } \theta_{X} < \theta_{C} \text{ and } 0 \text{ if } \theta_{X} \geqslant \theta_{C}. $$

Moreover, we have $\sqrt {k} \left (\bar {E}_{n} -1 \right ) \overset {d}{\longrightarrow } N(0,1)$, and, according to Lemmas 6 and 7, both $\frac {L_{nk}}{E_{n-k,n}} $ and $\frac {{\Lambda }_{F}(Z_{n-k,n})}{\hat {{\Lambda } }_{nF}(Z_{n-k,n})} $ tend to 1 as n → +∞. Hence

$$ \begin{array}{@{}rcl@{}} \sqrt{k} L_{nk}^{1-b} {\Delta}_n \overset{d}{=} (1 + o_{\mathbb{P}}(1)) \left( \!D_n - a \sqrt{k} L_{nk}^{-b}\!\left( \bar{E}_n - 1 \right) \right) + (1 + o_{\mathbb{P}}(1)) \sum\limits_{i=1}^6 R_{i,k}^{({\Delta})}, \end{array} $$

(A.7)

where

$$ D_{n}= \sqrt{k}L_{nk}^{-b}\left( L_{nk}^{1-a} \frac{\hat{p}_{k}}{\tilde{c}} -a \right), \text{ with } b= (1-a)/2. $$

It remains to study the behavior of D_n, which is done in the following Lemma which is proved in Appendix C.3.

Lemma 2

Under the assumptions of Theorem 1, we have, asn → +∞:

1.
If𝜃_X < 𝜃_C, then$D_{n}=\sqrt {k} (\hat {p}_{k}-1) \displaystyle \overset {\mathbb {P}}{\longrightarrow } - \frac {\theta _{X}}{\theta _{C}} \frac {c_{G}}{{c_{F}^{d}}} \alpha ^{\prime }$.
2.
If𝜃_X = 𝜃_C, then$D_{n}\! =\! \displaystyle \sqrt {k} \left (\frac {\hat {p}_{k}}{p} -1 \right ) \overset {d}{\longrightarrow } N\left (0,\frac {1-p}{p}\right )$, where$p= \displaystyle \frac {c_{F}}{c_{F}+c_{G}}$.
3.
If𝜃_X > 𝜃_C (hencea < 1 andb ∈]0, 1/2[), then$D_{n} \displaystyle \overset {d}{\longrightarrow } N\left (0,\frac {a}{\tilde {c}}\right )$.

Remark 2

Lemma 2 shows, in particular, that the proportion of non-censored data in the tail $\hat {p}_{k}$ tends to p = 1 if 𝜃_X < 𝜃_C, to $p=\frac {c_{F}}{c_{F}+c_{G}}$ if 𝜃_X = 𝜃_C (in this case, p equals $\tilde {c}$) and to p = 0 (with rate $L_{nk}^{a-1}$) if 𝜃_X > 𝜃_C. This has to be linked to the result of Lemma 4 (see Appendix C.1) concerning the limit of the theoretical function $p(x)=\mathbb {P}(\delta =1|Z=x)$ as x →∞.

When 𝜃_X < 𝜃_C, Lemma 2 states that D_n converges to a constant : hence, via Lemma 1, the leading term in Eq. A.7 is $\sqrt {k} L_{nk}^{-b}\left (\bar {E}_{n} -1 \right ) = \sqrt {k} \left (\bar {E}_{n} -1 \right ) \overset {d}{\longrightarrow } N(0,1)$, and we thus obtain as desired $ \sqrt {k} L_{nk}^{1-b} {\Delta }_{n} \overset {d}{\longrightarrow } N(m_{\Delta },1)$, where m_Δ is defined in the statement of Proposition 1.

When 𝜃_X = 𝜃_C, the constant b is still equal to 0 and both D_n and $\sqrt {k} \left (\bar {E}_{n} -1 \right )$ (which are independent) take part into the asymptotic normality of Δ_n, with $D_{n} - a \sqrt {k} \left (\bar {E}_{n} -1 \right ) \overset {d}{\longrightarrow } N(0,\sigma ^{2}_{\Delta })$ in relation (A.7), where $\sigma ^{2}_{\Delta } = \frac {1-p}{p} +a^{2} = \frac {1}{\tilde {c}}$. Thus, we obtain $\sqrt {k} L_{nk}^{1-b} {\Delta }_{n} \overset {d}{\longrightarrow } N(0, \frac {1}{\tilde {c}})$.

Finally, when 𝜃_X > 𝜃_C, $\sqrt {k} L_{nk}^{-b}\left (\bar {E}_{n} -1 \right ) $ tends to 0 and D_n is thus the leading term : we obtain $\sqrt {k} L_{nk}^{1-b} {\Delta }_{n} \overset {d}{\longrightarrow } N(0, \frac {a}{\tilde {c}})$ as desired.

This ends the proof of Proposition 1.

1.1.2 A.2. Proof of Proposition 2

Remind from Eq. A.4 that

$$ R_{n,l} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} \right) \text{ and } R_{n,\tilde{l}} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} \right). $$

Let A > 1. Under condition R_l(B, ρ), we have for all 𝜖 > 0 and t sufficiently large

$$ (1-\epsilon) B(t) K_{\rho} (x)\leqslant \frac{l(tx)}{l(t)} -1\leqslant (1+\epsilon) B(t) K_{\rho} (x) \hspace{0.3cm} (\forall 1 \leqslant x \leqslant A). $$

We only prove the result for R_{n, l}, the proof for $ R_{n,\tilde {l}}$ being very similar, using $R_{\tilde {l}}(\tilde {B}, \tilde {\rho })$ instead of R_l(B, ρ). Note that

$$ R_{n,l} = \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( 1+ \xi_{j,n} \right), $$

where $ \xi _{j,n}= \frac {l(E_{n-j+1,n})}{l(E_{n-k,n})} -1$ tends to 1 uniformly in j, because l is slowly varying and $\frac {E_{n-j+1,n}}{E_{n-k,n}}$ tends to 1 uniformly in j, according to Lemma 6 stated in Appendix C.4. Hence, using the following inequality,

$$ x- x^{2}/2 \leqslant \log(1+x) \leqslant x \quad (\forall x \geqslant -1/2) $$

and the fact that $x_{j,n} := \frac {E_{n-j+1,n}}{E_{n-k,n}} \geqslant 1$ tends to 1 uniformly in j, we obtain that for all 𝜖 > 0 and n sufficiently large,

$$ R_{n,l} \leqslant \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} -1 \right) \leqslant (1 + \epsilon) B(E_{n-k,n}) \frac{1}{k} \sum\limits_{j=1}^{k} K_{\rho}(x_{j,n}), $$

omitting the lower bound, which is treated similarly. Since K_ρ(1 + x) ∼ x when x tends to 0, then $K_{\rho }(x_{j,n}) \sim \frac {E_{n-j+1,n}-E_{n-k,n}}{E_{n-k,n}}$, uniformly in j. By Lemma 5 (also stated in Appendix C.4), $\frac {E_{n-j+1,n}-E_{n-k,n}}{E_{n-k,n}} \overset {d}{=} \frac {\tilde {E}_{k-j+1,k}}{E_{n-k,n}} $. Hence, it is easy to prove that

$$ E_{n-k,n} \frac{1}{k} \sum\limits_{j=1}^{k} K_{\rho}(x_{j,n}) \overset{\mathbb{P}}{\longrightarrow} 1. $$

Since B is regularly varying and $\frac {E_{n-k,n}}{L_{nk}} \rightarrow 1$, then $\frac {B(E_{n-k,n})}{E_{n-k,n}} \sim \frac {B(L_{nk})}{L_{nk}}$ and consequently

$$ \begin{array}{@{}rcl@{}} \sqrt{k}L_{nk}^{-b}B(L_{nk}) (1+o_{\mathbb{P}}(1)) &\leqslant& \liminf \sqrt{k}L_{nk}^{1-b} R_{n,l} \leqslant \limsup \sqrt{k}L_{nk}^{1-b} R_{n,l} \\ &&\leqslant \sqrt{k}L_{nk}^{-b}B(L_{nk}) (1+o_{\mathbb{P}}(1)). \end{array} $$

We conclude using assumption R_l(B, ρ) and conditions H₂(i), H₃(i) or H₄(ii), because |B| is regularly varying of order ρ, and we have $\rho =\tilde \rho $ when 𝜃_X ≤ 𝜃_C, and $\rho \leqslant \tilde \rho $ when 𝜃_X > 𝜃_C (see Lemma 3 in Appendix C.1).

1.1.3 Appendix A.3. Proof of Proposition 3

Recall that

$$ M_{n}= \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right). $$

Since $\frac {E_{n-j+1,n}}{\log (n/j)} \overset {\mathbb {P}}{\longrightarrow } 1 $ and $\frac {L_{nk}}{\log (n/j)} \overset {\mathbb {P}}{\longrightarrow } 1 $, uniformly in j = 1,…, k (see Lemma 6), then $\frac {E_{n-j+1,n}}{E_{n-k,n}}\overset {\mathbb {P}}{\longrightarrow } 1 $, uniformly in j = 1,…, k. By Lemma 5, $(E_{n-j+1,n}-E_{n-k,n})_{1 \leqslant j \leqslant k} \overset {d}{=} (\tilde {E}_{k,k}, {\ldots } , \tilde {E}_{1,k}) $. Therefore

$$ M_{n} \overset{d}{=} \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( 1+ \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right) = (1+o_{\mathbb{P}}(1)) \frac{1}{E_{n-k,n}} \frac{1}{k} \sum\limits_{j=1}^{k} \tilde{E}_{j}, $$

with $ \frac {1}{k} {\sum }_{j=1}^{k} \tilde {E}_{j} \rightarrow 1$, a.s. Hence, L_nkM_n also tends to 1, in probability, as desired.

Appendix B: Proof of Theorem 2

Starting from $x_{p_{n}} =\overline {F}^{-1}(p_{n}) $ and the definition of $\hat {x}_{p_{n}}$ in Eq. 5, we obtain

$$ \begin{array}{@{}rcl@{}} \log(x_{p_{n}}) & = & \theta_{X} \log\log(1/p_{n}) + \log(\bar{l}_{F}(-\log(p_{n}))),\\ \log(\hat{x}_{p_{n}}) & =& \hat{\theta}_{X,k} \log\log(1/p_{n}) - \hat{\theta}_{X,k} \log(\hat{{\Lambda} }_{nF}(Z_{n-k,n})) + \log(Z_{n-k,n}). \end{array} $$

Hence

$$ \begin{array}{@{}rcl@{}} \log(\hat{x}_{p_{n}}/x_{p_{n}}) & = & (\hat{\theta}_{X,k}-\theta_{X}) \log\log(1/p_{n}) - \hat{\theta}_{X,k} \log \left( \frac{\hat{{\Lambda} }_{nF}}{{\Lambda}_{F}}(Z_{n-k,n}) \right) \\&&- (\hat{\theta}_{X,k}-\theta_{X}) \log({\Lambda}_{F}(Z_{n-k,n})) + \left\{ - \log(\bar{l}_{F}(\log(1/p_{n}))) \right.\\ &&\left.- \theta_{X} \log(l_{F}(Z_{n-k,n})) \right\}, \\&=:& Q_{1,n} + Q_{2,n} +Q_{3,n} +Q_{4,n} . \end{array} $$

First of all, the result of Theorem 1 implies that

$$ \frac{\sqrt{k}L_{nk}^{-b}}{\log\log(1/p_{n})} Q_{1,n} = \sqrt{k}L_{nk}^{-b} (\hat{\theta}_{X,k}-\theta_{X}) \overset{d}{\longrightarrow} N \left( m,\frac{{\theta_{X}^{2}}}{a \tilde{c}} \right). $$

Then, Lemma 7 (stated in Appendix C.4) implies that $(\hat {{\Lambda } }_{nF}/{\Lambda }_{F})(Z_{n-k,n}) -1 = O_{\mathbb {P}} \left (1/(\sqrt {k}{\Lambda }_{F} (Z_{n-k,n}))\right )$. Hence

$$ \frac{\sqrt{k}L_{nk}^{-b}}{\log\log(1/p_{n})} Q_{2,n} = O_{\mathbb{P}}(1) \frac{1}{L_{nk}^{b} \log\log(1/p_{n}) {\Lambda}_{F}(Z_{n-k,n})} \overset{\mathbb{P}}{\longrightarrow} 0. $$

Now, remind that $ {\Lambda }_{F}(Z_{n-k,n}) = {\Lambda }_{F} \circ {\Lambda }_{H}^{-} (E_{n-k,n}) = E_{n-k,n}^{a} \tilde {l}(E_{n-k,n})$. Hence, the asymptotic normality of $(\hat {\theta }_{X,k}-\theta _{X}) $ yields

$$ \frac{\sqrt{k}L_{nk}^{-b}}{\log\log(1/p_{n})} Q_{3,n} = O_{\mathbb{P}}(1) \frac{\log(L_{nk})}{\log\log(1/p_{n})} \left( a\frac{\log(E_{n-k,n})}{\log(L_{nk})} + \frac{\log(\tilde{l}(E_{n-k,n}))}{\log(L_{nk})} \right). $$

The additional condition $H^{\prime }_{1}$ of Theorem 2, along with Lemma 6, imply that this term tends to 0 in probability.

Finally, Lemma 3 implies that

$$ Q_{4,n} =- \log\left( 1- \log(1/p_{n})^{\theta_{X} \rho_{F}} \bar{v}(\log(1/p_{n})\right) - \theta_{X} \log\left( 1- Z_{n-k,n}^{\rho_{F}} v(Z_{n-k,n})\right), $$

where v and $\bar {v}$ are slowly varying. Hence, $ \frac {\sqrt {k}L_{nk}^{-b}}{\log \log (1/p_{n})} Q_{4,n} $ tends to 0 as soon as there exist some 0 < δ < 1 such that $\frac {\sqrt {k}L_{nk}^{-b}}{\log \log (1/p_{n})} (\log {1/p_{n}})^{\theta _{X} \rho _{F}+\delta } = O(1)$ and $\frac {\sqrt {k}L_{nk}^{-b}}{\log \log (1/p_{n})} Z_{n-k,n}^{\rho _{F} + \delta } =O_{\mathbb {P}}(1)$. Remind that $Z_{n-k,n} = E^{\theta _{Z}}_{n-k,n} l(E_{n-k,n})$. Hence, condition $H^{\prime }_{1}$ guarantees that we only need to show that $\sqrt {k}L_{nk}^{-b+ \theta _{X} \rho _{F}} = O(1)$ and $\sqrt {k}L_{nk}^{-b+ \theta _{Z} \rho _{F}} = O(1)$. When 𝜃_X = 𝜃_Z < 𝜃_C, this is due to the additional condition H₂(iv). When 𝜃_X = 𝜃_Z = 𝜃_C, it is due to condition H₃(i). Finally, when 𝜃_X > 𝜃_Z = 𝜃_C, it is due to H₄(ii).

Appendix C: More technical aspects

3.1 C.1 Details on the second order properties

Remind that the starting assumption of this paper is relation (6),

$$ {\Lambda}_{F}(x) = x^{1/\theta_{X}} l_{F}(x) \ \text{ and } \ {\Lambda}_{G}(x) = x^{1/\theta_{C}} l_{G}(x), $$

where l_F and l_G are slowly varying. It is then easy to prove that

$$ \begin{array}{@{}rcl@{}} {\Lambda}_{F}^{-}(x) &=& x^{\theta_{X}} \bar{l}_{F}(x), \ {\Lambda}_{G}^{-}(x) = x^{\theta_{C}} \bar{l}_{G}(x), \ {\Lambda}_{H}(x) = x^{1/\theta_{Z}} l_{H}(x) , {\Lambda}_{H}^{-}(x) \\&=& x^{\theta_{Z}} l(x) \text{ and } {\Lambda}_{F} \circ {\Lambda}_{H}^{-}(x) = x^{a} \tilde{l} (x), \end{array} $$

where 𝜃_Z = min(𝜃_X, 𝜃_C), a = 𝜃_Z/𝜃_X, and $\bar {l}_{F}$, $\bar {l}_{G}$, l and $\tilde {l}$ are slowly varying.

More precisely, we have the following Lemma, under the second order condition (7), which is called upon in several occasions in this paper.

Lemma 3

Under Assumptions (A1) and (A2), we have,

$$ \begin{array}{ccrcl} l_{F}(x) = c_{F}(1-x^{\rho_{F}}v(x)) & \text{ and } & l_{G}(x) & = & c_{G}(1-x^{\rho_{G}} v(x)),\\ \bar{l}_{F}(x) = c_{F}^{-\theta_{X}}(1-x^{\theta_{X} \rho_{F}} v(x)) & \text{ and } & \bar{l}_{G}(x) & = & c_{G}^{-\theta_{C}}(1-x^{\theta_{C} \rho_{G}} v(x)),\\ l_{H}(x) =c_{H}(1-x^{\rho_{H}} v(x)), \ l(x) = c_{H}^{-{\theta_{Z}}}(1-x^{\rho} v(x)) & \text{ and } & \tilde{l}(x) & = & \tilde{c}(1-x^{\tilde{\rho}} v(x)), \end{array} $$

for different slowly varying functions generically noted v, with

$$ c_{H}= \left\{ \begin{array}{ll} c_{F} & \text{ if } \theta_{X} < \theta_{C} \\ c_{F}+c_{G} & \text{ if } \theta_{X} = \theta_{C} \\ c_{G} & \text{ if } \theta_{X} > \theta_{C} \ \end{array} \right. , \ \ \ \tilde{c}= c_{H}^{-a} c_{F} = \left\{ \begin{array}{ll} 1 & \text{ if } \theta_{X} < \theta_{C} \\ c_{F}/(c_{F}+c_{G}) & \text{ if } \theta_{X} = \theta_{C} \\ c^{-a}_{G} c_{F} & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. , $$

$$ \rho_{H}=\left\{ \begin{array}{ll} \max(\rho_{F}, 1/\theta_{C}-1/\theta_{X}) & \text{ if } \theta_{X} < \theta_{C} \\ \max(\rho_{F}, \rho_{G}) & \text{ if } \theta_{X} = \theta_{C} \\ \max(\rho_{G}, 1/\theta_{X}-1/\theta_{C}) & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. , \ \ \rho= \theta_{Z} \rho_{H} = \left\{ \begin{array}{ll} \max(\theta_{X} \rho_{F}, d-1) & \text{ if } \theta_{X} < \theta_{C} \\ \max(\theta_{X} \rho_{F}, \theta_{X} \rho_{G}) & \text{ if } \theta_{X} = \theta_{C} \\ \max(\theta_{C} \rho_{G}, a-1) & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. , $$

and

$$ \tilde{\rho}=\left\{ \begin{array}{ll} \rho & \text{ if } \theta_{X} \leqslant \theta_{C} \\ \max(\theta_{C} \rho_{G}, \theta_{C} \rho_{F}, a-1) & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. . $$

The proof of this Lemma is based on Theorem B.2.2 in de Haan and Ferreira (2006) as well as the concept of de Bruyn conjugate (see Proposition 2.5 in Beirlant et al. 2004). Details are ommited for brevity.

Remark 3

It is clear that all the aforementioned slowly varying functions satisfy the second order condition SR2 with the corresponding second order parameters defined in the previous Lemma. In particular, rate functions B and $\tilde {B}$ associated, respectively, to l and $\tilde {l}$ satisfy $x^{\tilde {\rho }}v(x)/\tilde {B}(x) \rightarrow -1/\tilde {\rho }$ and x^ρv(x)/B(x) →− 1/ρ, as x → +∞, with v, the appropriate slowly varying function (see again Theorem B.2.2 in de Haan and Ferreira 2006) .

Let us introduce, as in Einmahl et al. (2008), the function p(⋅) defined by

$$ p(x) = \mathbb{P}(\delta=1 | Z=x). $$

The following lemma provides useful developments of functions p and $p \circ {\Lambda }_{H}^{-}$. In particular, it provides details about the rate of convergence of p(x), as x → +∞ (to a limit which was denoted by p in the statement of Lemma 2, as the limit of the sequence $\hat p_{k}$). Its proof is based on the fact that

$$p(x)=\frac{\bar{G}(x) f(x)}{\bar{G}(x) f(x) +\bar{F}(x) G(x)},$$

where f and g are respectively the derivatives of F and G, as well as on the results of Lemma 3. It is omitted for brevity.

Lemma 4

Under assumptions (A1) and (A2), wehave

$$ \frac{1}{p(x)} = 1 + \frac{\theta_{X}}{\theta_{C}} x^{\frac{1}{\theta_{C}}-\frac{1}{\theta_{X}}} \frac{l_{G}(x)}{l_{F}(x)} (1+o(1)). $$

In particular, asx → +∞,

$$ p(x) \rightarrow p:=\left\{ \begin{array}{ll} 1 & \text{ if } \theta_{X} < \theta_{C}, \\ \tilde{c}= c_{F}/(c_{F}+c_{G}) & \text{ if } \theta_{X} = \theta_{C}, \\ 0 & \text{ if } \theta_{X} > \theta_{C} . \end{array} \right. $$

Moreover, we have

$$ \begin{array}{@{}rcl@{}} &&\text{ if } \theta_{X} < \theta_{C}, 1/ (p \circ {\Lambda}_{H}^{-})(x) = \ 1 + d \frac{c_{G}}{{c^{d}_{F}}} x^{d-1} (1-x^{-\beta} v(x)),\\ &&\text{ if } \theta_{X} = \theta_{C}, (p \circ {\Lambda}_{H}^{-})(x) = \ \tilde{c} (1-x^{\rho} v(x)),\\ &&\text{ if } \theta_{X} > \theta_{C}, 1/ (p \circ {\Lambda}_{H}^{-})(x) = \ 1 + \frac{1}{a \tilde{c}} x^{1-a} (1-x^{\tilde{\rho}} v(x)), \end{array} $$

whered = 𝜃_X/𝜃_C, vis a generic notation for a slowly varying function and

$$ -\beta = \max(\theta_{X} \rho_{F}, \theta_{X} \rho_{G}, d-1). $$

3.2 C.2. Proof of Lemma 1

Remind that

$$ \begin{array}{@{}rcl@{}} R_{1,k}^{({\Delta})} & = & \displaystyle {\Delta}_{n} - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} - 1\right)\\ & = & \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \left( \log(1+\xi_{j,n}) - \xi_{j,n} \right), \end{array} $$
where
$$ \xi_{j,n} = \displaystyle \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} - 1 $$
Introducing $ {\Delta }_{j} = \hat {{\Lambda } }_{nF}(Z_{n-j+1,n}) - {\Lambda }_{F}(Z_{n-j+1,n})$, for j = 1,…, k + 1 (which must not be confused with the Δ_n defined earlier in relation (A.3)), we have readily
$$ \xi_{j,n} = \displaystyle \frac{{\Lambda}_{F}(Z_{n-k,n})}{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \left( {\Delta}_{j} \frac{{\Lambda}_{F}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-j+1,n})} - {\Delta}_{k+1} \right) \frac{1}{{\Lambda}_{F}(Z_{n-k,n})}. $$
Lemma 7 (in Appendix C.4) implies that $ |{\Delta }_{j}| = O_{\mathbb {P}} (1/\sqrt {j-1})$ for all j = 2,…, k + 1, $|{\Delta }_{1}| = O_{\mathbb {P}} (1)$ and $ \frac {{\Lambda }_{F}(Z_{n-k,n})}{\hat {{\Lambda } }_{nF}(Z_{n-k,n})}$ tends to 1, in probability.

Let now E₁,…, E_n be n independent standard exponential random variable such that $ \frac {1}{{\Lambda }_{F}(Z_{n-k,n})} = \frac {E_{n-k,n}^{-a}}{\tilde {l}(E_{n-k,n})}$, where $\tilde {l}$ tends to $\tilde {c}$ at infinity. Moreover, $\frac {{\Lambda }_{F}(Z_{n-k,n})}{{\Lambda }_{F}(Z_{n-j+1,n})} \leqslant 1$ and $\frac {E_{n-k,n}}{L_{nk}}$ tends to 1 (see Lemma 6). Thus, we obtain $|\xi _{1,n} | \leqslant (1 + o_{\mathbb {P}}(1)) \left (O_{\mathbb {P}} (1) + O_{\mathbb {P}} (1/\sqrt {k}) \right ) L_{nk}^{-a} (1/\tilde {c} + o_{\mathbb {P}}(1))$ and
$$ |\xi_{j,n} | \leqslant (1 + o_{\mathbb{P}}(1)) \left( O_{\mathbb{P}} (1/\sqrt{j-1}) + O_{\mathbb{P}} (1/\sqrt{k}) \right) L_{nk}^{-a} (1/\tilde{c} + o_{\mathbb{P}}(1)), \text{ for } j=2, \ldots, k . $$
Therefore $\xi _{1,n}^{2} \leqslant O_{\mathbb {P}}(1) L_{nk}^{-2a}$ and
$$ \xi_{j,n}^{2} \leqslant O_{\mathbb{P}}(1) \frac{L_{nk}^{-2a}}{j-1} \text{ for } j=2 \ldots, k. $$
Consequently, since a > 0, sup1≤j≤k|ξ_{j, n}| tends to 0, in probability, and thus, using the inequality 0 ≤ x − log(1 + x) ≤ x² ($\forall x \geqslant -1/2$), we obtain,
$$ 0 \leqslant -R_{1,k}^{({\Delta})} \leqslant \frac{1}{k} \sum\limits_{j=1}^{k} \xi_{j,n}^{2}. $$
But $ \frac {1}{k} {\sum }_{j=1}^{k} 1/j \sim \frac {\log k}{k}$. Hence
$$ 0 \leqslant -\sqrt{k} L_{nk}^{1-b} R_{1,k}^{({\Delta})} \leqslant O_{\mathbb{P}}(1) \frac{\log k}{\sqrt{k}} L_{nk}^{1-b-2a}. $$
Let 𝜖 > 0. We have 1 − b − 2a = 3b − 1, and so we want
$$ \sqrt{k}(\log k)^{-1} L_{nk}^{1-3b} = (k^{\epsilon}/\log k) \left( \sqrt{k} L_{nk}^{(1-3b)/(1-2\epsilon)}\right)^{1-2\epsilon} $$
to go to + ∞. This is automatic when 0 ≤ b ≤ 1/3. If b > 1/3 (i.e. when 𝜃_X > 3𝜃_C), we can write (1 − 3b)/(1 − 2𝜖) = 1 − 3b − δ for some positive δ and small enough 𝜖, and we have $\sqrt {k} L_{nk}^{1-3b+\delta } = \sqrt {k}L_{nk}^{-b} \times L_{nk}^{-2b+1-\delta }$: the first factor goes to infinity (it is the CLT rate, assumption H₄(i)), and the second factor as well for δ (i.e.𝜖) small enough because b is always smaller than 1/2.
Remind that
$$ R_{2,k}^{({\Delta})} = \frac{1}{{\Lambda}_{F}(Z_{n-k,n})} \frac{1}{k} \sum\limits_{j=1}^{k} \left( \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - {\Lambda}_{F}(Z_{n-j+1,n}) \right) \left( \frac{{\Lambda}_{F}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-j+1,n})} -1 \right) $$
and that $ \frac {{\Lambda }_{F}(Z_{n-k,n})}{{\Lambda }_{F}(Z_{n-j+1,n})} = x_{j,n}^{-a} \frac {\tilde {l}(E_{n-k,n})}{\tilde {l}(E_{n-j+1,n})}$, where $x_{j,n} = \frac {E_{n-k,n}}{E_{n-j+1,n}} \rightarrow 1$, uniformly on j (see Lemma 6). Hence, using the fact that $\sup _{1\leqslant j \leqslant k} |\hat {{\Lambda } }_{nF}(Z_{n-j+1,n}) - {\Lambda }_{F}(Z_{n-j+1,n}) | = O_{\mathbb {P}}(1)$ (see Lemma 7), we obtain
$$ | R_{2,k}^{({\Delta})} | \leqslant O_{\mathbb{P}}(1) \frac{E_{n-k,n}^{-a}}{\tilde{l}(E_{n-k,n})} \left( \frac{1}{k} \sum\limits_{j=1}^{k} |x_{j,n}^{-a} -1| + \frac{1}{k} \sum\limits_{j=1}^{k} x_{j,n}^{-a} \left| \frac{\tilde{l}(E_{n-k,n})}{\tilde{l}(E_{n-j+1,n})} - 1 \right| \right). $$
Introducing, once again, $\tilde {E_{1}}, \ldots , \tilde {E_{k}}$, k independent standard exponential random variables, such that, $\frac {E_{n-j+1,n}-E_{n-k,n}}{E_{n-k,n}} \overset {d}{=} \frac {\tilde {E}_{k-j,k}}{E_{n-k,n}} $ (see Lemma 5), and using a Taylor expansion, we have
$$ | R_{2,k}^{({\Delta})}| \leqslant O_{\mathbb{P}}(1) E_{n-k,n}^{-a} \left( \frac{1}{k} \sum\limits_{j=1}^{k} \frac{\tilde{E}_{k-j,k}}{E_{n-k,n}} + \frac{1}{k} \sum\limits_{j=1}^{k} \left| \frac{\tilde{l}(E_{n-k,n})}{\tilde{l}(E_{n-j+1,n})}-1 \right| \right). $$
Since $\bar {E}_{n} = \frac {1}{k} {\sum }_{j=1}^{k} \tilde {E}_{j}$ and $\frac {E_{n-k,n}}{L_{nk}}$ tend to 1, in probability, the first term of the right hand side multiplied by $\sqrt {k} L_{nk}^{1-b} $ tends to 0, by the fact that $\sqrt {k} L_{nk}^{-a-b} $ tends to 0 under condition H₂(iii), H₃(ii) or H₄(iv). For the second term of the right hand side, we proceed as for $R_{n,\tilde {l}}$ (see the proof of Proposition 2), by using the fact that condition $R_{\tilde {l}}(\tilde {B}, \tilde {\rho })$ implies $R_{1/\tilde {l}}(-\tilde {B}, \tilde {\rho })$ and again that $\sqrt {k} L_{nk}^{-a-b} $ tends to 0.
Remind that
$$ R_{3,k}^{({\Delta})}= \frac{ \hat{p}_{k}}{E_{n-k,n}^{a}} \left( \frac{1}{\tilde{l}(E_{n-k,n})} - \frac{1}{\tilde{c}} \right), $$
where, according to Lemma 3, we have $1-\frac {\tilde {l}(x)}{\tilde {c}} = x^{\tilde {\rho }} v(x) $, with v slowly varying. Hence,
$$ R_{3,k}^{({\Delta})}= (1+o_{\mathbb{P}}(1)) E_{n-k,n}^{-a} \frac{ \hat{p}_{k}}{\tilde{c}} E_{n-k,n}^{\tilde{\rho}} v(E_{n-k,n}). $$
We prove, in Lemma 2 (in Appendix A.1), that $L_{nk}^{1-a} \frac {\hat {p}_{k}}{\tilde {c}}$ tends to a. Moreover, since v is slowly varying and $\frac {E_{n-k,n}}{L_{nk}}$ tends to 1 (see Lemma 6), we obtain
$$ \sqrt{k} L_{nk}^{1-b} R_{3,k}^{({\Delta})} = a (1+o_{\mathbb{P}}(1)) \sqrt{k} L_{nk}^{-b+ \tilde{\rho}} v(L_{nk}). $$
This term tends to 0 in the case $\theta _{X} \geqslant \theta _{C}$, under condition H₃(i) or H₄(ii). In the case 𝜃_X < 𝜃_C, we use the fact that $\frac {x^{\tilde {\rho }} v(x)}{\tilde {B}(x)} \rightarrow -\frac {1}{\tilde {\rho }}$ (see Remark 7 in Appendix C.1). Therefore,
$$ \sqrt{k} L_{nk}^{1-b} R_{3,k}^{({\Delta})} = -\frac{1}{\tilde{\rho}} (1+o_{\mathbb{P}}(1)) \sqrt{k} L_{nk}^{-b} \tilde{B}(L_{nk}), $$
which tends to $-\frac { \tilde {\alpha }}{\rho }$ under condition H₂(ii), since $\rho =\tilde {\rho }$, in this case.
Remind that
$$ R_{4,k}^{({\Delta})} = - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1 \right). $$
The treatment of this term is very similar to that of $R_{n,\tilde {l}}$ (see the proof of Proposition 2). It relies on condition $R_{\tilde {l}}(\tilde {B}, \tilde {\rho })$, as well as H₂(ii), H₃(i) or H₄(ii). It is thus omitted.
Remind that
$$ R_{5,k}^{({\Delta})} = - \frac{1}{k} \sum\limits_{j=1}^{k} \left\{ \left( \left( 1+ \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right)^{a}-1 \right) - a \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right\}. $$
This term is 0 in the case 𝜃_X ≤ 𝜃_C (a = 1). So, we only consider the case 𝜃_X > 𝜃_C (where 0 < a < 1). It is clear (see Lemmas 5 and 6) that $\xi _{j,n} = \frac {\tilde {E}_{k-j+1,k}}{E_{n-k,n}} \overset {d}{=} \frac {E_{n-j+1,n}}{E_{n-k,n}} -1$ tends to 0, uniformly in j. Hence, by a Taylor expansion, we obtain

$$ \begin{array}{@{}rcl@{}} R_{5,k}^{({\Delta})} & = & - (1+ o_{\mathbb{P}}(1)) \frac{1}{k} {\sum}_{j=1}^{k} \frac{a(a-1)}{2} \xi_{j,n}^{2}\\ & \overset{d}{=} & (1+ o_{\mathbb{P}}(1)) \frac{a(1-a)}{2} \frac{1}{E_{n-k,n}^{2}} \frac{1}{k} {\sum}_{j=1}^{k} \tilde{E_{j}}^{2} \sim \frac{a(1-a)}{2} L_{nk}^{-2}, \text{ (in probability)}, \end{array} $$

and we conclude using H₄(iv).
Finally, remind that
$$ R_{6,k}^{({\Delta})}= \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}} \left( E_{n-k,n}^{1-a} - L_{nk}^{1-a} \right). $$
This term is 0 in the case 𝜃_X ≤ 𝜃_C (a = 1). So, we only consider the case 𝜃_X > 𝜃_C, where 0 < a < 1 and $ \hat {p}_{k}$ tends to 0 (see Lemma 2 in Appendix A.1). By the mean value theorem,
$$ E_{n-k,n}^{1-a} - L_{nk}^{1-a} = (1-a) L_{nk}^{-a} \left( \frac{\widetilde{L}_{nk}}{L_{nk}} \right)^{-a}(E_{n-k,n} - L_{nk}), $$
where $\widetilde {L}_{nk}$ is between L_nk and E_{n−k, n}. Hence $\frac {\widetilde {L}_{nk}}{L_{nk}}$ tends to 1 and, since $\sqrt {k}(E_{n-k,n} - L_{nk}) \overset {d}{\longrightarrow } N(0,1)$ (see Lemma 6), we have
$$ \sqrt{k} L_{nk}^{1-b} |R_{6,k}^{({\Delta})}| \leqslant o_{\mathbb{P}}(1)L_{nk}^{-b-a}=o_{\mathbb{P}}(1). $$

3.3 C.3. Proof of Lemma 2

The function p(⋅) below has already been defined in Appendix C.1,

$$ p(x)= \mathbb{P}(\delta=1 | Z=x). $$

Proceeding as in Einmahl et al. (2008), we carry on the proof by considering now that δ_i is related to Z_i by

$$ \delta_{i}=\mathbb{I}_{U_{i}\leqslant p(Z_{i})}, $$

where (U_i)_i≤n denotes an independent sequence of standard uniform variables, independent of the sequence (Z_i)_i≤n. We denote by U_{[1, n]},…, U_{[n, n]} the (unordered) values of the uniform sample pertaining to the order statistics Z_{1, n} ≤… ≤ Z_{n, n} of the observed sample Z₁,…, Z_n.

Remind that $Z_{i}= {\Lambda }^{-}_{H}(E_{i})$, where E₁,…, E_n are independent standard exponential random variables. We introduce, for every 1 ≤ i ≤ n, the standard uniform random variables V_i = 1 − exp(−E_i) such that $Z_{i}= {\Lambda }^{-}_{H}(-\log (1-V_{i}))$, and define the function

$$ r(t):=(p \circ {\Lambda}_{H}^{-})(-\log t). $$

Lemma 4 (in Appendix C.1) provides valuable information about the behavior of r(⋅) at infinity. We now write

$$ \begin{array}{@{}rcl@{}} D_{n}= \sqrt{k} L_{nk}^{-b} \left( L_{nk}^{1-a} \frac{\hat{p}_{k}}{\tilde{c}} -a \right) &= & \displaystyle \frac{L_{nk}^{-b}}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \frac{L_{nk}^{1-a}} {\tilde{c}} \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(1-V_{n-j+1,n})}-a \right)\\ & = & \displaystyle\frac{L_{nk}^{b}}{\tilde{c} \sqrt{k}} \sum\limits_{j=1}^{k} \left( \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(1-V_{n-j+1,n})} - \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(j/n)} \right)\\ & &\displaystyle + \frac{L_{nk}^{-b}}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \frac{L_{nk}^{1-a}} {\tilde{c}} \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(j/n)}-a \right)\\ & =: & T_{1,k} + T_{2,k}. \end{array} $$

Whatever the position of 𝜃_X versus 𝜃_C, we will prove below that the term T_{1, k} above converges to 0 in probability. It turns out that this amounts to proving that, for some positive sequence v_n = o(1/n) (to be chosen later) and some constant c > 0,

$$ \begin{array}{@{}rcl@{}} \!\sqrt{k} L_{nk}^b S_{n,k} \overset{n\rightarrow\infty}{\longrightarrow} 0\! \text{ where\! } S_{n,k} \!:= \sup \left\{ |r(s) - r(t)| ; \frac 1 n \leqslant t \leqslant \frac k n \ , \ |s - t| \leqslant c\sqrt{k}/n \ ,\! \ s\geqslant v_n \right\}. \end{array} $$

(C.1)

As a matter of fact, if we introduce the events

$$ \textstyle A_{n,c} = \left\{ \sup_{1\leqslant j\leqslant k} |(1-V_{n-j+1,n}) - j/n | \leqslant c\sqrt{k}/n \right\} text{ and } B_{n} = \left\{ 1-V_{n,n} \geqslant v_{n} \right\}, $$

then, since $|\mathbb {I}_{U\leqslant a}-\mathbb {I}_{U\leqslant b}| \overset {d}{=} \mathbb {I}_{U\leqslant |a-b|}$ for any standard uniform U and constants a, b in [0, 1], it comes

$$ \begin{array}{@{}rcl@{}} \mathbb{P}(|T_{1,k}| > \delta ) & \leqslant & \mathbb{P} \left( \frac{1}{k} \sum\limits_{i=1}^{k} \mathbb{I}_{U_{j} \leqslant | r(1-V_{n-j+1,n}) - r(j/n) | } > \tilde c \delta / (\sqrt{k}L_{nk}^{b}) \right) \\ & \leqslant & \mathbb{P} \left( \sqrt{k} L_{nk}^{b} S_{n,k} > \eta \right) + \mathbb{P} \left( \frac{1}{k} \sum\limits_{i=1}^{k} \mathbb{I}_{U_{j} \leqslant \eta/(\sqrt{k}L_{nk}^{b})} > \tilde c \delta / (\sqrt{k}L_{nk}^{b}) \right) \\ &&+ \mathbb{P}({B_{n}^{c}}) + \mathbb{P}(A_{n,c}^{c}) \end{array} $$

for any given δ > 0 and η > 0. The second term in the right-hand side is (by Markov’s inequality) lower than $\tilde c \delta / \eta $ (which is arbitrarily small), the third term is equal to nv_n(1 + o(1)) = o(1), and the fourth term is arbitrarily small (for c large enough) by the weak convergence of the uniform tail quantile process. Therefore, we are left to prove that $\sqrt {k}L_{nk}^{b} S_{n,k}=o(1)$ (i.e. relation (C.1)), so that $T_{1,k}=o_{\mathbb {P}}(1)$ will be proved. This is done in the different cases distinguished below, along with the treatment of the main term T_{2, k}.

The whole proof heavily relies on the first and second order developments stated in Lemma 4 of Appendix C.1, concerning the function $p\circ {\Lambda }_{H}^{-}$.

1. Case𝜃_X < 𝜃_C

In this situation, we have a = 1, b = 0, $\tilde {c}=1$ and $p=\lim _{z \rightarrow + \infty } p(z) = \lim _{t\searrow 0} r(t)=1$ via Lemma 4. Hence

$$ \begin{array}{@{}rcl@{}} T_{2,k} & = & \frac{1}{\sqrt{k}} {\sum}_{j=1}^{k} \left( \mathbb{I}_{U_{n-j+1,n} \leqslant r(j/n)}- 1 \right)\\ & \overset{d}{=} & - \frac{1}{\sqrt{k}} {\sum}_{j=1}^{k} \left( \mathbb{I}_{U_{j}>r(j/n)} -(1-r(j/n)) \right) - \frac{1}{\sqrt{k}} {\sum}_{j=1}^{k} (1-r(j/n))\\ & =: & - T^{\prime}_{2,k} - T^{\prime\prime}_{2,k}, \end{array} $$

where $T^{\prime }_{2,k}$ turns out to be a sum of centered independent random variables. Let us now prove that $T^{\prime }_{2,k}=o_{\mathbb {P}}(1)$, $T^{\prime \prime }_{2,k}$ tends to Aα^′ (here $A=\frac {\theta _{X}}{\theta _{C}} \frac {c_{G}}{{c_{F}^{d}}}$ where α^′ is defined in condition H₂(iii)) and that $\sqrt {k}S_{n,k}\to 0$ (hence, as explained above, $T_{1,k}=o_{\mathbb {P}}(1)$).

Concerning $T^{\prime }_{2,k}$, by definition of r(⋅) and thanks to Lemma 4, we have

$$ 1-r(x)= A (-\log x)^{d-1} (1 + o(x)) \hspace{0.23cm} \ \text{ where } \ d=\theta_{X}/\theta_{C} \in ]0,1[ . $$

Therefore, since log(n/j)/L_nk tends to 1 uniformly in j under condition H₁ (Lemma 6), we obtain

$$ \mathbb{V}(T^{\prime}_{2,k}) = \frac{1}{k} \sum\limits_{j=1}^{k} r(j/n) (1-r(j/n)) \leqslant \frac{1}{k}\sum\limits_{j=1}^{k} (1-r(j/n)) \leqslant L_{nk}^{d-1} A (1 + o(1)), $$

which implies that $ \mathbb {V}(T^{\prime }_{2,k})$ tends to 0, since d < 1.

Concerning $T^{\prime \prime }_{2,k}$, we have similarly, using now assumption H₂(iii) and Lemma 6 (log n/j ∼ L_nk),

$$ T^{\prime\prime}_{2,k}= A (1+o(1)) \sqrt{k} (L_{nk})^{d-1} \overset{n\rightarrow\infty}{\longrightarrow} A \alpha^{\prime}. $$

Let us now deal with $\sqrt {k}S_{n,k}$. From now on, let cst denote some generic positive constant. Since r(t) converges to 1 as t ↘ 0, and thanks to Lemma 4, we have, for s and t small,

$$ \begin{array}{@{}rcl@{}} | r(s) - r(t) | & = & \left| \frac 1{r(s)} - \frac 1 {r(t)} \right| r(s)r(t) \\ & \leqslant & cst \left\{ |(-\log t)^{d-1} - (-\log s)^{d-1}| + |(-\log t)^{d-1-\beta}v(-\log t)\right. \\&&\left.- (-\log s)^{d-1-\beta}v(-\log s)| \right\} \end{array} $$

Introducing the set $Z_{n}=\{ (s,t) ; 1/n \leqslant t \leqslant k/n , |t-s|\leqslant c\sqrt {k}/n , s\geqslant v_{n} \}$ and reminding that v_n = o(1/n) (an appropriate sequence will be chosen in few lines), it can be checked that applying the mean value theorem to the function h(t) = (− log t)^d− 1 of positive derivative h^′(t) = (1 − d)t^− 1(− log t)^d− 2, yields for large n (below, u = u(s, t) denotes some appropriate value between s and t)

$$ \textstyle \sqrt{k} \sup_{(s,t)\in Z_{n}} |h(t)-h(s)| \leqslant \sup_{(s,t)\in Z_{n}} |h^{\prime}(u)|.|t-s| \leqslant cst \sqrt{k} \frac 1{v_{n}} L_{nk}^{d-2} c\sqrt{k}/n = cst \frac{k}{nv_{n}} L_{nk}^{d-2}. $$

This is the first step towards the proof of $\sqrt {k}S_{n,k}=o(1)$. The second step requires to do the same job with the function $\tilde h(t)=(-\log t)^{d-1-\beta }v(-\log t)$, where v(⋅) is slowly varying at infinity. It is known (cf Bingham et al. 1987 page 15) that we have xv^′(x)/v(x) → 0 and x^−βv(x) → 0 as x →∞, so that

$$ | \tilde h^{\prime}(t) | = |1-d+\beta| \frac 1 t (-\log t)^{d-2} \left| 1- cst \frac {xv^{\prime}(x)}{v(x)} \right| x^{-\beta} |v(x)| \leqslant cst |h^{\prime}(t)| $$

where x denoted (− log t), which is large when t is close to 0. Therefore, taking into account all the previous findings, and considering the choice v_n = k^−𝜖/n = o(1/n), we have proved that for n large

$$ \textstyle \sqrt{k} S_{n,k} \leqslant cst \frac{k}{nv_{n}} L_{nk}^{d-2} = cst . k^{1+\epsilon} L_{nk}^{d-2} = cst \left( \sqrt{k} L_{nk}^{(d-2)/2 + \delta}\right)^{2(1+\epsilon)} $$

which turns out to be o(1) as soon as 0 < δ < d/2 thanks to assumption H₂(iii). This ends the proof of Lemma 2 in the mild censoring case 𝜃_X < 𝜃_C.

2. Case𝜃_X = 𝜃_C

In this case, we also have a = 1, b = 0 but now $\tilde {c}=\frac {c_{F}}{c_{F}+c_{G}} =p=\lim _{z \rightarrow \infty } p(z) = \lim _{t\searrow 0} r(t)$ via Lemma 4. It is then clear that

$$ \begin{array}{@{}rcl@{}} T_{2,k} & \overset{d}{=} & \displaystyle \frac 1 p \frac{1}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \mathbb{I}_{U_{j} \leqslant r(j/n)}- r(j/n) \right) + \frac 1 p \frac{1}{\sqrt{k}} \sum\limits_{j=1}^{k} (r(j/n)-p)\\ & =: & T^{\prime}_{2,k} + T^{\prime\prime}_{2,k} \end{array} $$

Let us prove that $T^{\prime }_{2,k} \overset {d}{\longrightarrow } N(0,\frac {1-p}{p})$, while $T^{\prime \prime }_{2,k}$ and $\sqrt {k} S_{n,k}$ are both o(1).

Concerning $T^{\prime }_{2,k}$, we have

$$ \mathbb{V}(T^{\prime}_{2,k}) = \frac{1}{p^{2}} \frac{1}{k}\sum\limits_{j=1}^{k} r(j/n) (1-r(j/n)), $$

which tends to $\frac {1-p}{p}$, since r(j/n) tends to p, uniformly in j (see Lemma 4). We conclude, for this term, using Lyapunov’s theorem (details are omitted, here r(j/n) ≤ 1).

Concerning $T^{\prime \prime }_{2,k}$, since Lemma 4 yields r(t) = p (1 − (− log t)^ρv(− log t)), we have (for some δ > 0)

$$ T^{\prime\prime}_{2,k} = -\frac {1}{\sqrt{k}} \sum\limits_{j=1}^{k} (\log(n/j))^{\rho} v(\log(n/j)) = -\sqrt{k} (L_{nk})^{\rho+\delta} L_{nk}^{-\delta}v(L_{nk}) \frac{1}{k} \sum\limits_{j=1}^{k} u_{n,j}^{\rho} $$

where we noted u_{n, j} = log(n/j)/L_nk, which tends to 1 uniformly in j thanks to condition H₁, and used the fact that v(log(n/j)) ∼ v(L_nk) because v ∈ RV₀. The Riemann sum on the right-hand side converges to 1, so for a choice of δ satisfying assumption H₃(i), we have proved that $T^{\prime \prime }_{2,k}=o(1)$.

Concerning now $\sqrt {k}S_{n,k}$, we proceed similarly as in the first case. Introducing $\tilde h(t)=(-\log t)^{\rho }v(-\log t)$ where v(⋅) is slowly varying at infinity, we have as previously $|\tilde h^{\prime }(t)|=\frac 1 t (-\log t)^{\rho -1+\epsilon }o(1)$ for t ↘ 0 and any some small 𝜖 > 0. Therefore, Lemma 4, definitions of S_{n, k} and of the set Z_n, along with the mean value theorem, yield

$$ \sqrt{k}S_{n,k} = \tilde c \sup_{(s,t)\in Z_{n}} |\tilde h(t)-\tilde h(s)| \leqslant cst \sqrt{k} \sup_{(s,t)\in Z_{n}} \{ |\tilde h^{\prime}(u)|.|t-s| \} \leqslant cst \sqrt{k} \frac 1{v_{n}} L_{nk}^{\rho-1+\epsilon} \tilde c \frac{\sqrt{k}}{n}. $$

Choosing, in the definition of S_{n, k}, the sequence v_n = k^−𝜖/n = o(1/n) for some small 𝜖 > 0, we have

$$ \textstyle \sqrt{k}S_{n,k} = cst \left( \sqrt{k} L_{nk}^{(\rho-1+\epsilon)/(2(1+\epsilon))} \right)^{2(1+\epsilon)} = cst \left( \sqrt{k} L_{nk}^{(\rho-1)/2+\delta} \right)^{2(1+\epsilon)} $$

which turns out to be o(1) according to assumption H₃(i) (if $\rho \geqslant 1$) or H₃(ii) (if ρ < 0), as soon as δ is sufficiently small. This ends the proof of Lemma 2 in the semi-strong censoring case 𝜃_X = 𝜃_C.

3. Case𝜃_X > 𝜃_C

Now we are in the situation where a < 1, b = (1 − a)/2 ∈]0, 1/2[, and $\tilde {c}=\frac {c_{F}}{{c_{G}^{a}}}$ is different from $p=\lim _{z \rightarrow \infty } p(z) = \lim _{t\searrow 0} r(t)=0$. Since 1 − a − b = b, we have readily

$$ \begin{array}{@{}rcl@{}} T_{2,k} & \overset{d}{=} & \displaystyle \frac{L_{nk}^{b}}{\tilde{c}} \frac{1}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \mathbb{I}_{U_{j} \leqslant r(j/n)}- r(j/n) \right) + \frac{a L_{nk}^{-b}}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n)-1 \right)\\ & =: & T^{\prime}_{2,k} + T^{\prime\prime}_{2,k} \end{array} $$

Let us prove that $T^{\prime }_{2,k} \overset {d}{\longrightarrow } N(0,\frac {a}{\tilde {c}})$, while $T^{\prime \prime }_{2,k}$ and $\sqrt {k} L_{nk}^{b} S_{n,k}$ are both o(1) (the latter will guarantee that $T_{1,k}=o_{\mathbb {P}}(1)$).

Concerning $T^{\prime }_{2,k}$, we have

$$ \mathbb{V}(T^{\prime}_{2,k}) = \frac{L_{nk}^{2b}}{\tilde{c}^{2}} \frac{1}{k} \sum\limits_{j=1}^{k} r(j/n) (1-r(j/n)) $$

Lemma 4 yields the following first order development, as t ↘ 0,

$$ r(t)= a \tilde{c} (-\log t)^{a-1} (1 + o(t)) = a \tilde{c} (-\log t)^{-2b} (1 + o(t)). $$

(A.2)

Since u_{n, j} = log(n/j)/L_nk tends to 1 uniformly in j, under condition H₁ (see Lemma 6), it is then easy to see that $\mathbb {V}(T^{\prime }_{2,k})$ tends to $\frac {a}{\tilde {c}}$. We conclude concerning $T^{\prime }_{2,k}$ using Lyapunov’s theorem (again, details are easy and omitted).

Concerning $T^{\prime \prime }_{2,k}$, we write

$$ \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n)-1 = \left( \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n) - \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} \right) + \left( \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} -1 \right) $$

and treat these two terms separately. Using the second order formula stated in Lemma 4, we have

$$ \begin{array}{@{}rcl@{}} \frac{1}{r(t)} = 1+ \frac{(-\log t)^{1-a}}{a \tilde{c} } \left( 1- (-\log t)^{\tilde{\rho}} v(-\log t) \right). \end{array} $$

(A.3)

and consequently, for some small δ > 0,

$$ \begin{array}{@{}rcl@{}} \frac{a\tilde c}{L_{nk}^{1-a}r(j/n)} & = & \left( \frac{\log(n/j)}{L_{nk}}\right)^{1-a} \left( 1 - (\log(n/j))^{\tilde \rho} v(\log(n/j)) + a\tilde c (\log(n/j))^{a-1} \right) \\ & = & \left( \frac{\log(n/j)}{L_{nk}}\right)^{1-a} \left( 1 - L_{nk}^{\tilde\rho+\delta} o(1) + a\tilde c L_{nk}^{a-1} (1+o(1))\right) \\ \end{array} $$

where we used condition H₁ and the slow variation of v, which guarantees that v(log(n/j)) ∼ v(L_nk) and x^−δv(x) → 0 as x →∞. Now, since $ \tilde {\rho }= \max (\theta _{Z} \rho _{F}, \theta _{Z} \rho _{G},a-1) \geqslant a-1$, it comes

$$ \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n) - \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} = (1+o(1)) L_{nk}^{\tilde\rho + \delta} o(1) $$

and therefore the first term of $T^{\prime \prime }_{2,k}$ is equal to $a \sqrt {k} L_{nk}^{-b+\tilde {\rho }+\delta } o(1)$, which tends to 0 under condition H₄(ii). The second term of $T^{\prime \prime }_{2,k}$ is

$$ a \sqrt{k} L_{nk}^{-b} \frac{1}{k} \sum\limits_{j=1}^{k} \left( \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} -1 \right). $$

But $ \left (\frac {L_{nk}}{\log (n/j)} \right )^{1-a} -1= (a-1) \frac {\log (k/j)}{L_{nk}} (1+o(1))$ with $ \frac {1}{k} {\sum }_{j=1}^{k} \log (k/j)$ tending to 1. So the second term of $T^{\prime \prime }_{2,k}$ is equal to

$$ a(a-1) \sqrt{k} L_{nk}^{-1-b} (1+o(1)), $$

and this quantity tends to 0 under condition H₄(iv).

Concerning now $\sqrt {k}L_{nk}^{b} S_{n,k}$, we have

$$ S_{n,k} = \sup\limits_{(s,t)\in Z_{n}} |r(t)-r(s)| \leqslant \sup\limits_{(s,t)\in Z_{n}} \left| \frac 1{r(t)} - \frac 1 {r(s)} \right| \sup_{(s,t)\in Z_{n}} \{ r(t)r(s) \}. $$

Thanks to the first order relation (A.2), the second supremum of the right-hand side is lower than a constant times $L_{nk}^{2(a-1)}$. The first supremum will be handled with the more precise second order development (A.3), which yields

$$ \sup\limits_{(s,t)\in Z_{n}} \left| \frac 1{r(t)} - \frac 1 {r(s)} \right| = cst \left\{ \sup\limits_{(s,t)\in Z_{n}} |h(t)-h(s)| + \sup\limits_{(s,t)\in Z_{n}} |\tilde h(t)-\tilde h(s)| \right\} $$

where we define h(t) = (− log t)^1−a and $\tilde h(t)=(-\log t)^{1-a+\tilde {\rho }}v(-\log t)$. Contrary to the functions arisen in case 1, the functions h and $\tilde h$ tend to infinity instead of vanishing to 0, when t ↘ 0 : this will be counterbalanced by the second supremum. Studying derivatives of the functions h and $\tilde h$, and again using a first order Taylor expansion, we obtain via similar computations as in the previous cases, for n large and any 𝜖 > 0 (with the choice v_n = k^−𝜖/n),

$$ \sup\limits_{(s,t)\in Z_{n}} \left| \frac 1{r(t)} - \frac 1 {r(s)} \right| \leqslant cst. k^{1/2+\epsilon} L_{nk}^{-a}. $$

Therefore, gathering the two suprema, we have (for some small value of δ > 0 depending on 𝜖)

$$ \sqrt{k}L_{nk}^{b} S_{n,k} \leqslant cst.k^{1+\epsilon} L_{nk}^{b-a}L_{nk}^{2(a-1)} = cst. k^{1+\epsilon} L_{nk}^{-1-b} = cst \left( \sqrt{k} L_{nk}^{-(1+b)/2+\delta} \right)^{2(1+\epsilon)} $$

which, by assumption H₄(iii), converges to 0 as n →∞.

3.4 C.4. Additional useful lemmas

Let E₁,…, E_n be n iid standard exponential random variables.

Lemma 5

According to Lemma 1.4.3. inReiss (1989), wehave

$$ (E_{n-j+1,n} -E_{n-k,n})_{1\leqslant j \leqslant k} \overset{d}{=} (\tilde{E}_{k-j+1,k})_{1\leqslant j \leqslant k}, $$

where$\tilde {E}_{1}, \ldots , \tilde {E}_{k}$arekindependent standard exponential random variables.

Lemma 6

Under conditionH₁, we have, asn → +∞,

$$ \frac{E_{n-k,n}}{L_{nk}} \overset{\mathbb{P}}{\longrightarrow} 1, \ \frac{E_{n-j+1,n}}{\log(n/i)} \overset{\mathbb{P}}{\longrightarrow} 1, \text{ uniformly on } j=1, {\ldots} k \ \text{ and } \ \sqrt{k} (E_{n-k,n} - L_{nk}) \overset{d}{\longrightarrow} N(0,1). $$

We refer to Girard (2004b) for the proof of this Lemma.

Lemma 7

If we consider the classical random censoring model ( 1 ) with continuous distributionfunctionsF andG of the variablesX andC, then the following in-probability results hold :

$$ \begin{array}{@{}rcl@{}} &&\left| \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - {\Lambda}_{F}(Z_{n-j+1,n}) \right| =O_{\mathbb{P}} (1/\sqrt{j-1}), \text{ for } j=2, \ldots, k+1,\\ &&\left| \hat{{\Lambda} }_{nF}(Z_{n,n}) - {\Lambda}_{F}(Z_{n,n}) \right| =O_{\mathbb{P}} (1). \end{array} $$

The first statement is a part of Theorem 1 in Csorgo (1996). For the second statement, one has to make a careful examination of Theorem 2.1 in Zhou (1991), in a narrower context, since the samples (X_i) and (C_i) we consider are i.i.d. , whereas Zhou considers possibly non-identically distributed censoring variables C_i. In pages 2269-2270 of the mentioned paper, one can find out that the maximum observed value (named T_n) does not have to be excluded from the probability bound (2.3) : it can indeed be proved, by following the steps of the proof of (2.3), that for every n,

$$ \forall \epsilon>0, \hspace{0.5cm} \textstyle{ \mathbb{P} \left[ \sup_{t\leqslant Z_{n,n}} \left| \hat{{\Lambda} }_{nF}(t) - {\Lambda}_{F}(t) \right| > \epsilon \right] } \leqslant 6 \epsilon^{2/3}. $$

So the second statement of Lemma 7 follows.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Worms, J., Worms, R. Estimation of extremes for Weibull-tail distributions in the presence of random censoring. Extremes 22, 667–704 (2019). https://doi.org/10.1007/s10687-019-00354-2

Download citation

Received: 29 January 2019
Revised: 04 June 2019
Accepted: 10 June 2019
Published: 22 June 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10687-019-00354-2

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of extremes for Weibull-tail distributions in the presence of random censoring

Abstract

Access this article

Similar content being viewed by others

New classes of tests for the Weibull distribution using Stein’s method in the presence of random right censoring

Tail fitting for truncated and non-truncated Pareto-type distributions

Local Robust Estimation of Pareto-Type Tails with Random Right Censoring

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix

1.1 Appendix A. Proof of Theorem 1

Proposition 1

Proposition 2

Proposition 3

Remark 1

Corollary 1

1.1.1 A.1. Proof of Proposition 1

Lemma 1

Lemma 2

Remark 2

1.1.2 A.2. Proof of Proposition 2

1.1.3 Appendix A.3. Proof of Proposition 3

Appendix B: Proof of Theorem 2

Appendix C: More technical aspects

3.1 C.1 Details on the second order properties

Lemma 3

Remark 3

Lemma 4

3.2 C.2. Proof of Lemma 1

3.3 C.3. Proof of Lemma 2

3.4 C.4. Additional useful lemmas

Lemma 5

Lemma 6

Lemma 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation