Skip to main content
Log in

Robust doubly protected estimators for quantiles with missing data

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Doubly protected methods are widely used for estimating the population mean of an outcome Y from a sample where the response is missing in some individuals. To compensate for the missing responses, a vector \(\mathbf {X}\) of covariates is observed at each individual, and the missing mechanism is assumed to be independent of the response, conditioned on \(\mathbf {X}\) (missing at random). In recent years, many authors have turned from the estimation of the mean to that of the median, and more generally, doubly protected estimators of the quantiles have been proposed. In this work, we present doubly protected estimators for the quantiles in semiparametric models that are also robust, in the sense that they are resistant to the presence of outliers in the sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Agostinelli C, Bianco AM, Boente G (2017) Robust estimation in single index models when the errors have a unimodal density with unknown nuisance parameter. arXiv preprint arXiv:1709.05422

  • Bianco AM, Spano PM (2019) Robust inference for nonlinear regression models. TEST 28(2):369–398

    MathSciNet  MATH  Google Scholar 

  • Bianco A, Boente G (2004) Robust estimators in semiparametric partly linear regression models. J Stat Plan Inference 122(1–2):229–252

    MathSciNet  MATH  Google Scholar 

  • Bianco A, Boente G, González-Manteiga W, Pérez-González A (2010) Estimation of the marginal location under a partially linear model with missing responses. Comput Stat Data Anal 54(2):546–564

    MathSciNet  MATH  Google Scholar 

  • Bianco AM, Boente G, González-Manteiga W, Pérez-González A (2011) Asymptotic behavior of robust estimators in partially linear models with missing responses: the effect of estimating the missing probability on the simplified marginal estimators. TEST 20(3):524–548

    MathSciNet  MATH  Google Scholar 

  • Bianco AM, Boente G, González-Manteiga W, Pérez-González A (2018) Plug-in marginal estimation under a general regression model with missing responses and covariates. TEST 28(1):1–41

    MathSciNet  MATH  Google Scholar 

  • Boente G, Fraiman R (1989) Robust nonparametric regression estimation. J Multivar Anal 29(2):180–198

    MathSciNet  MATH  Google Scholar 

  • Boente G, Rodriguez D (2008) Robust bandwidth selection in semiparametric partly linear regression models: Monte Carlo study and influential analysis. Comput Stat Data Anal 52(5):2808–2828

    MathSciNet  MATH  Google Scholar 

  • Boente G, Rodriguez D (2012) Robust estimates in generalized partially linear single-index models. Test 21(2):386–411

    MathSciNet  MATH  Google Scholar 

  • Cheng PE (1994) Nonparametric estimation of mean functionals with data missing at random. J Am Stat Assoc 89(425):81–87

    MATH  Google Scholar 

  • Díaz I (2017) Efficient estimation of quantiles in missing data models. J Stat Plan Inference 141(2):711–724

    MathSciNet  Google Scholar 

  • Fasano MV, Maronna Ricardo RA, Sued M, Víctor J et al (2012) Continuity and differentiability of regression M functionals. Bernoulli 18(4):1284–1309

    MathSciNet  MATH  Google Scholar 

  • Hirano K, Imbens GW, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4):1161–1189

    MathSciNet  MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685

    MathSciNet  MATH  Google Scholar 

  • Kang JDY, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22(4):523–539

    MathSciNet  MATH  Google Scholar 

  • Little RJA, Rubin DB (2014) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Marazzi A, Yohai VJ (2004) Adaptively truncated maximum likelihood regression with asymmetric errors. J Stat Plan Inference 122(1–2):271–291

    MathSciNet  MATH  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ, Salibián-Barrera M (2018) Robust statistics: theory and methods (with R). Wiley, New York

    MATH  Google Scholar 

  • Porter KE, Gruber S, Van Der Laan MJ, Sekhon JS (2011) The relative performance of targeted maximum likelihood estimators. Int J Biostat 7(1):1–34

    MathSciNet  Google Scholar 

  • Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89(427):846–866

    MathSciNet  MATH  Google Scholar 

  • Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90(429):106–121

    MathSciNet  MATH  Google Scholar 

  • Rotnitzky, A, Robins, J, Babino, L (2017) On the multiply robust estimation of the mean of the g-functional. arXiv preprint arXiv:1705.08582

  • Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592

    MathSciNet  MATH  Google Scholar 

  • Statti F, Sued M, Yohai VJ (2018) High breakdown point robust estimators with missing data. Commun Stat Theory Methods 47(21):5145–5162

    MathSciNet  Google Scholar 

  • Sued M, Yohai VJ (2013) Robust location estimation with missing data. Can J Stat 41(1):111–132

    MathSciNet  MATH  Google Scholar 

  • Van der Vaart AW (2000) Asymptotic statistics, vol 3. Cambridge University Press, Cambridge

    Google Scholar 

  • Wang Q, Linton O, Härdle W (2004) Semiparametric regression analysis with missing response at random. J Am Stat Assoc 99(466):334–345

    MathSciNet  MATH  Google Scholar 

  • Yohai VJ et al (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656

    MathSciNet  MATH  Google Scholar 

  • Zhang Z, Chen Z, Troendle JF, Zhang J (2012) Causal inference on quantiles with an obstetric application. Biometrics 68(3):697–706

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank Dr. Alfio Marazzi for the data set in the example and the editor and referees for their comments and suggestions which have helped us to improve this paper.

Funding

This work was supported by Secretaria de Ciencia y Tecnica, Universidad de Buenos Aires (Grant Nos. 20020150200110BA, 20020130100279BA), Agencia Nacional de Promoción Científica y Tecnológica (Grant No. pict 2014-0351).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marina Valdora.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was partially supported by Grant pict 2014-0351 from anpcyt and Grants 20020150200110BA and 20020130100279BA from the Universidad de Buenos Aires at Buenos Aires, Argentina.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 121 KB)

Appendix

Appendix

Proof of Theorem 1

If \(\pi (\mathbf {X})=\pi _{\infty }(\mathbf {X})\), then \(\pi (\mathbf {X})/\pi _{\infty }(\mathbf {X})=1\) and \(C_\infty =1\). Therefore, \(F_1=F_0\), \(F_{2a}=F_{3a}\) and \(F_{\infty }=F_0\).

If \(Y=g(\mathbf {X})+u\), with u independent of \((A, \mathbf {X})\) and \(g(\mathbf {X})=g_{\infty }(\mathbf {X})\), then \(F_{3a}\) is the distribution function of \(g(\mathbf {X})\) and G is the distribution function of u. Therefore, \(F_{3a}*G\) is the distribution function of \(g(\mathbf {X})+u=Y\), that is to say \(F_{3a}*G=F_0\). On the other hand, let Z be a random variable, independent of u, with distribution function \(F_{2a}\), then \(F_{2a}*G\) is the distribution function of \(Z+u\), which, by definition, is equal to

$$\begin{aligned} P(Z+u\le y)&=P(Z\le y-u) = \frac{1}{C_\infty } \mathbb {E}\left\{ \frac{\pi (\mathbf {X})}{\pi _{\infty }(\mathbf {X})}\mathrm I_{ \{g(\mathbf {X})\le y-u\}} \right\} \\&= \frac{1}{C_\infty } \mathbb {E}\left\{ \frac{\pi (\mathbf {X})}{\pi _{\infty }(\mathbf {X})}\mathrm I_{ \{g(\mathbf {X})+u\le y \}} \right\} =F_1(y). \ \; \end{aligned}$$

Thus, \(F_{\infty }=F_0\) also in this case. \(\square \)

The following five lemmas will be used to prove Theorem 2. Recall that \(\widetilde{F}_{1}\) and \(\widetilde{F}_{2a}\), defined in (6), are indeed random sequences of cumulative distribution functions based on sample of size n (which we omit in the notation).

Lemma 1

Consider \(\widetilde{F}_1\) and \(F_1\), defined in (6) and (7), respectively. Under assumptions A1 and A2, \(\widetilde{F}_1\) converges to \(F_1\) uniformly, a.s., that is \( \mathbb {P}\left( \sup _y\vert \widetilde{F}_1(y)- F_1(y)\vert \rightarrow 0 \right) =1 \)

Proof

We show first that \(C_n/n\rightarrow C_\infty \) a.s. To do so, note that we can write

$$\begin{aligned} \frac{C_n}{n} =\frac{1}{n}\sum _{i=1}^n \left\{ \frac{A_i }{\widehat{\pi }_n(\mathbf {X}_i)}-\frac{A_i }{ \pi _\infty (\mathbf {X}_i)}\right\} +\frac{1}{n}\sum _{i=1}^n\frac{A_i }{\pi _\infty (\mathbf {X}_i)}. \end{aligned}$$
(12)

By the law of large numbers, the second term in (13) converges a.s. to

$$\begin{aligned} \mathbb E \left\{ \frac{A}{\pi _\infty (\mathbf {X})} \right\} =\mathbb E\left\{ \frac{1}{\pi _\infty (\mathbf {X})} \mathbb E \left( \left. A \right| X \right) \right\} =\mathbb E \left\{ \frac{\pi (\mathbf {X})}{\pi _\infty (\mathbf {X})} \right\} =C_\infty . \end{aligned}$$

It remains to prove that the first term in (13) converges to zero a.s. Now, under conditions A1 and A2, given \(\varepsilon \in (0,1)\) there exists \(n_0\) such that \(\left| \pi _\infty (\mathbf {X})-\widehat{\pi }_n(\mathbf {X})\right| <\varepsilon i_\infty \) for all \(n\ge n_0\), and therefore, \((1-\varepsilon )i_\infty \le \widehat{\pi }_n(\mathbf {X})\) for such n, implying that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n A_i \frac{\left| \pi _\infty (\mathbf {X}_i)-\widehat{\pi }_n(\mathbf {X}_i) \right| }{\widehat{\pi }_n(\mathbf {X}_i) \pi _\infty (\mathbf {X}_i)}<\frac{1}{n}\frac{1}{(1-\varepsilon )i_\infty ^2}\sum _{i=1}^n A_i \left| \pi _\infty (\mathbf {X}_i)-\widehat{\pi }_n(\mathbf {X}_i) \right| <\frac{\varepsilon }{(1-\varepsilon )i_\infty } . \end{aligned}$$
(13)

and then we obtain the announced result.

Second, we prove that

$$\begin{aligned} \mathbb {P}\left\{ \lim _{n\rightarrow \infty }\sup _y \left| \widetilde{F}_1(y)- \frac{1}{C_\infty }\frac{1}{n}\sum _{i=1}^n \frac{A_i\mathrm I_{\{Y_i\le y\}} }{ \pi _{\infty }(\mathbf {X}_i)} \right| =0 \right\} =1. \end{aligned}$$
(14)

To prove (15), notice that adding and subtracting \((nC_\infty )^{-1}\sum _{i=1}^n {{A_i\mathrm I_{\{Y_i\le y\}} }/{ \widehat{\pi }_n(\mathbf {X}_i)}}\), we get

$$\begin{aligned} \left| \widetilde{F}_1(y)- \frac{1}{C_\infty }\frac{1}{n}\sum _{i=1}^n {\frac{A_i\mathrm I_{\{Y_i\le y\}} }{ \pi _{\infty }(\mathbf {X}_i)}} \right|&\le \vert \{C_{n}/n\}^{-1}-{C_\infty }^{-1}\vert \;C_{n}/n \nonumber \\&+ \frac{1}{C_\infty n}\sum _{i=1}^n A_i\vert \widehat{\pi }(\mathbf {X}_i)^{-1}- {\pi _{\infty }(\mathbf {X}_i)}^{-1}\vert . \end{aligned}$$
(15)

Neither of the two terms in (16) depend on y, and they both converge to zero under A1-A2; the convergence of the first term follows from the convergence of \(C_n/n \) to \(C_\infty \) a.s., while the convergence of the second one has already been proved in (14). This proves (15).

Finally, using arguments similar to those in the proof of the Glivenko–Cantelli theorem (see, for instance, Theorem 19.1 in Van der Vaart 2000), it can be shown that

$$\begin{aligned} \mathbb {P}\left\{ \lim _{n\rightarrow \infty }\sup _y \left| \frac{1}{n}\sum _{i=1}^n \frac{A_i\mathrm I_{\{Y_i\le y\}} }{\pi _{\infty }(\mathbf {X}_i)} - \mathbb E\left\{ \frac{A\mathrm I_{\{Y\le y\}}}{\pi _{\infty }(\mathbf {X}_i)} \right\} \right| =0 \right\} =1. \end{aligned}$$
(16)

The result follows combining (15) and (17). \(\square \)

Henceforth, we use \(G_n \xrightarrow {w} G\) to denote weak convergence of cumulative distribution functions.

Lemma 2

Consider \(\widetilde{F}_{2a}\) and \(F_{2a}\), defined in (6) and (7), respectively. Under assumptions A1–A3, it holds that \(\widetilde{F}_{2a}\) converges weakly to \(F_{2a}\) a.s., i.e.,

$$\begin{aligned} \mathbb {P}\left( \widetilde{F}_{2a} \xrightarrow {w} F_{2a}\right) =1. \end{aligned}$$
(17)

Proof

Let \(\mathcal {C}_{\text {buc}}\) denote the set of functions \(f:\mathbb {R}\rightarrow \mathbb {R}\) bounded and uniformly continuous. In order to prove the lemma, we will show that

$$\begin{aligned} \mathbb {P}\left( \lim _{n\rightarrow \infty } \int f d \widetilde{F}_{2a} = \int f d F_{2a} , \, \forall f \in \mathcal {C}_{\text {buc}} \right) =1. \end{aligned}$$
(18)

Let

$$\begin{aligned} \widetilde{F}_3(y)= \frac{1}{C_n}\sum _{i=1}^n \frac{A_i\delta _{ \widehat{g}_n(\mathbf {X}_i)}(y)}{ \pi _{\infty }(\mathbf {X}_i)}\,\, \text{ and }\,\, \widetilde{F}_4(y)= \frac{1}{C_n}\sum _{i=1}^n \frac{A_i\delta _{g_{\infty }(\mathbf {X}_i)}(y)}{\pi _{\infty }(\mathbf {X}_i)}. \end{aligned}$$

Note that both \(\widetilde{F}_3\) and \(\widetilde{F}_4\) defined above are sequences of random functions; however, we omit n in the notation for simplicity.

Fix \( f \in \mathcal {C}_{\text {buc}}\). Defining \(I_1(f) = \left| \int f d \widetilde{F}_{2a} - \int f d \widetilde{F}_{3} \right| ,\)\(I_2(f) = \left| \int f d \widetilde{F}_{3} - \int \right. \left. f d \widetilde{F}_{4} \right| ,\) and \(I_3(f) = \left| \int f d \widetilde{F}_{4} - \int f d F_{2a} \right| \), we get that

$$\begin{aligned} \left| \int f d \widetilde{F}_{2a} - \int f d F_{2a} \right| \le I_1(f) + I_2(f)+I_3(f). \end{aligned}$$
(19)

Let us now consider each of these three terms. Since f is bounded, using arguments similar to those in the proof of Lemma 1, we have that under A1 and A2

$$\begin{aligned} \mathbb {P}\left( \lim _{n\rightarrow \infty } \left| \int f d \widetilde{F}_{2a} - \int f d F_{3} \right| = 0, \, \forall f \in \mathcal C_{\text {buc}} \right) =1. \end{aligned}$$
(20)

To deal with \(I_2(f)\), notice that

$$\begin{aligned} I_2(f)&=\left| \frac{1}{C_n}\sum _{i=1}^n \frac{A_i f\{{\widehat{g}_n(\mathbf {X}_i)\}}}{\pi _{\infty }(\mathbf {X}_i)} - \frac{1}{C_n}\sum _{i=1}^n \frac{A_i f\{{g_{\infty }(\mathbf {X}_i)\}}}{\pi _{\infty }(\mathbf {X}_i)} \right| \nonumber \\&\quad \le \frac{n}{C_n} \frac{1}{ni_\infty } \sum _{i=1}^n \left| f\{{\widehat{g}_n(\mathbf {X}_i)\}} -f\{{g_{\infty }(\mathbf {X}_i)\}}\right| .\nonumber \end{aligned}$$
(21)

Since f is uniformly continuous, given \(\varepsilon >0\), there exists \(\delta \) such that \(\vert u_1-u_2\vert <\delta \) implies \(\vert f(u_1)-f(u_2)\vert <\varepsilon \). Take K large and consider the compact set \(\mathcal {K}=\{\vert \vert \mathbf {X}\vert \vert \le K\}\). For n large enough, invoking now A3, we get that \(\sup _{\mathbf {X}\in \mathcal {K}} \vert \widehat{g}_n(\mathbf {X})-g_{\infty }(\mathbf {X})\vert <\delta \) and therefore, the right-hand side of (22) is smaller than

$$\begin{aligned} \frac{n}{C_n}\left( \frac{\varepsilon }{i_\infty }+ \frac{1}{ni_\infty } \sum _{i=1}^n 2\vert \vert f\vert \vert _\infty I_{\{\vert \vert \mathbf {X}_i\vert \vert >K\}}\right) , \end{aligned}$$
(22)

which implies that

$$\begin{aligned} \mathbb {P}\left( \lim _{n\rightarrow \infty } \left| \int f d \widetilde{F}_{3} - \int f d \widehat{F}_{4} \right| = 0, \, \forall f \in \mathcal {C}_{\text {buc}} \right) =1. \end{aligned}$$
(23)

It remains to show that

$$\begin{aligned} \mathbb {P}\left( \lim _{n\rightarrow \infty } \int f d \widetilde{F}_{4} = \int f d F_{2a}, \, \forall f \in \mathcal {C}_{\text {buc}} \right) =1 \end{aligned}$$
(24)

Notice that, as in Lemma 1, using arguments similar to those in the proof of the Glivenko–Cantelli theorem, we have that

$$\begin{aligned}&\mathbb {P}\left( \lim _{n\rightarrow \infty }\sup _y \left| \frac{1}{n} \sum _{i=1}^n \frac{A_i\delta _{g_{\infty }(\mathbf {X}_i)}(y)}{\pi _{\infty }(\mathbf {X}_i)} -\mathbb E\left\{ \frac{A\mathrm I_{\{ g_{\infty }(\mathbf {X}) \le y\}} }{ \pi _{\infty }(\mathbf {X})} \right\} \right| =0 \right) =1 \end{aligned}$$
(25)

and therefore

$$\begin{aligned}&\mathbb {P}\left( \lim _{n\rightarrow \infty } \frac{1}{\widetilde{C}_n} \sum _{i=1}^n \frac{A_i\delta _{g_{\infty }(\mathbf {X}_i)}(y)}{\pi _{\infty }(\mathbf {X}_i)} =\frac{1}{C_\infty }\mathbb E\left\{ \frac{A\mathrm I_{\{ g_{\infty }(\mathbf {X}) \le y\}} }{ \pi _{\infty }(\mathbf {X})} \right\} \;,\forall y\in \mathbb {R} \right) =1,\qquad \end{aligned}$$
(26)

where \(\widetilde{C}_n=\sum _{i=1}^n A_i/\pi _{\infty }(X_i)\). Both of the sequences as the limit function presented in (26) are cumulative distribution functions. By the MAR assumption,

$$\begin{aligned} \frac{1}{C_\infty }\mathbb E\left\{ \frac{A\mathrm I_{\{ g_{\infty }(\mathbf {X}) \le y\}} }{ \pi _{\infty }(\mathbf {X})} \right\} =F_{2a}(y) \end{aligned}$$
(27)

and, therefore, (26) implies that

$$\begin{aligned}&\mathbb {P}\left( \lim _{n\rightarrow \infty } \frac{1}{\widetilde{C}_n} \sum _{i=1}^n \frac{A_i f(g_{\infty }(\mathbf {X}_i))}{\pi _{\infty }(\mathbf {X}_i)} =\int f d F_{2a}, \, \forall f \in \mathcal {C}_{\text {buc}} \right) =1. \end{aligned}$$
(28)

Finally, since \(\widetilde{C}_n/C_n\rightarrow 1\), we conclude that (24) holds. The result stated in the lemma follows from combining (20), (21), (23) and (24). \(\square \)

The following lemma was proved in Sued and Yohai (2013), as a part of Theorem 1.

Lemma 3

Consider \(\widetilde{F}_{3a}\) and \(\widetilde{G}\), defined in (6) and \(F_{3a}\) and G defined in (8). Under assumption A3, \(\widetilde{F}_{3a}\) converges weakly to \(F_{3a}\) a.s. and also \(\widetilde{G}\) converges weakly to G a.s., i.e.,

$$\begin{aligned} \mathbb {P}\left( \widetilde{F}_{3a} \xrightarrow {w} F_{3a}\right) =1 \quad \hbox {and}\quad \mathbb {P}\left( \widetilde{G} \xrightarrow {w} G\right) =1. \end{aligned}$$

As announced in Sect. 3, we will now show that the functional \(T_p\), presented in (2), can be defined over an enlarged family of functions, which includes cumulative distribution functions, preserving its continuity.

Lemma 4

Consider a distribution function \(F:\mathbb {R}\rightarrow [0,1]\) and \(p\in (0,1)\) such that there exists a unique value \(y_p\) with \(F(y_p)=p\), and so \(T_p(F)=y_p\), for \(T_p\) defined in (2). Let \(F_{n}:\mathbb {R} \rightarrow \mathbb {R}, n\ge 1\), be a sequence of functions such that

  1. 1.

    \(\lim _{y \rightarrow - \infty } F_{n}(y)=0\) and \(\lim _{y \rightarrow + \infty } F_{n}(y)=1\).

  2. 2.

    \(F_{n}\) converges uniformly to F.

Then \(T_p\) can be defined at \(F_n\) and \(\lim _{n \rightarrow \infty } T_p(F_n)=T_p(F).\)

Proof

Let \( A_{n,p}=\left\{ y \in \mathbb {R}: F_n(y) \ge p \right\} .\) By the assumptions of the lemma, \(\lim _{y \rightarrow + \infty } F_{n}(y)=1\), and therefore, \(A_{n,p}\) is not empty. Since \(\lim _{y \rightarrow - \infty } F_{n}(y)=0\) we conclude that \(A_{n,p}\) is bounded from below, and therefore \(T_p(F_n)=\inf A_{n,p}\) is well defined.

Given \(\varepsilon >0\), let \(\delta =\min \left\{ \left( F(y_{p}+\varepsilon )-F(y_{p}))/2\right) ,\left( F(y_{0})-F(y_{p}-\varepsilon )\right) /2\right\} .\) By the assumptions of the lemma, \(\delta >0\). Now, the uniform convergence of \(F_n\) to F guarantees that there exists \(n_0\) such that \(\sup _{y \in \mathbb {R}}\vert F_n(y)-F(y)\vert \le \delta , \hbox {for all}\) \(n\ge n_0\). In particular, the following inequalities hold

$$\begin{aligned} \sup _{y<y_{p}-\varepsilon }F_{n}(y)<F(y_p-\epsilon )+\delta \le F(y_p)-2\delta +\delta \le p-\delta \end{aligned}$$
(29)
$$\begin{aligned} F_{n}(y_{p}+\varepsilon ) \ge F(y_{p}+\varepsilon )-\delta \ge F(y_{p})+2\delta -\delta =p+\delta >p. \end{aligned}$$
(30)

From (29) and (30) we conclude that for all \(n\ge n_{0}\) we have \(|y_{n}-y_p|\le \delta ,\ \)and therefore, \(y_{n}\rightarrow y_{p}\) en This concludes the proof. \(\square \)

Proof of Theorem 2

The continuity of G implies that \(F_{2a}*G\) and \(F_{3a}*G\) are both continuous cumulative distribution functions. Since weak convergence to a continuous limit distribution function implies uniform convergence (see, for example, Lemma 2.11 in Van der Vaart (2000)), Lemmas 2 and 3 imply that \(\widetilde{F}_{2a}*\widetilde{G}\) and \(\widetilde{F}_{3a}*\widetilde{G}\) converge uniformly to \(F_{2a}*G\) and \(F_{3a}*G\), respectively, a.s.

Combining these results with Theorem 1, we obtain (9). From Lemma 4, we conclude that \(T_p(\widehat{F}_{\tiny {\hbox {RSDP}}})\) is well defined. Moreover, Lemma 4 and the uniform convergence proved below imply that \(T_p(\widehat{F}_{\tiny {\hbox {RSDP}}})\) converges to \(T_p(F_0)\) a.s. \(\square \)

Proof of Theorem 3

We will show that A1–A3 are satisfied, with \(\widehat{\pi }_n(\mathbf {X})=\hbox {expit}(\widehat{\gamma }_n^{\tiny {t}}\mathbf {X})\), \(\pi _{\infty }(\mathbf {X})=\hbox {expit}(\varvec{\gamma }_\infty ^{\tiny {t}}\mathbf {X})\), \(\widehat{g}_n (\mathbf {X})=\varvec{\beta }_n^{\tiny {t}} \mathbf {X}\) and \(g_{\infty }(\mathbf {X})= \varvec{\beta }_\infty ^{\tiny {t}}\mathbf {X}\). To prove A1, note that

$$\begin{aligned} \vert \widehat{\pi }_n(\mathbf {X})-\pi _{\infty }(\mathbf {X})\vert =\left| \pi (\mathbf {X}; \widehat{ \varvec{\gamma }}_n)-\pi (\mathbf {X}; \varvec{\gamma }_\infty )\right| =\left| \hbox {expit}^\prime ( \widetilde{\varvec{\gamma }}_n^{\tiny {t}}\mathbf {X}) \mathbf {X}^{\tiny {t}} (\varvec{\widehat{\varvec{\gamma }}_n}- \varvec{\gamma }_\infty )\right| , \end{aligned}$$

where \(\widetilde{\varvec{\gamma }}_n\) is an intermediate point between \(\varvec{{\widehat{\gamma }}}_n\) and \(\varvec{\gamma }_\infty \). The convergence of \(\widehat{\varvec{\gamma }}_n\) to \(\varvec{\gamma }_\infty \) a.s. combined with the assumed compactness for the support of \(\mathbf {X}\) implies the validity of A1.

A2 is satisfied since \(\hbox {expit}(\varvec{\gamma }_\infty ^{\tiny {t}}\mathbf {X})\) is continuous and \(\mathbf {X}\) has a compact support.

To prove the validity of A3, observe that \(\vert \widehat{g}_n(\mathbf {X})-g_{\infty }(\mathbf {X})\vert = \vert \{\varvec{\widehat{\beta }_n}-\varvec{\beta }_\infty \}^{\tiny {t}}\mathbf {X}\vert .\) The convergence of \(\widehat{\varvec{\beta }}_n\) to \(\varvec{\beta }_\infty \) a.s. guarantees that A3 is also satisfied.

Finally, note that if \(\mathbb {P}(A=1\mid \mathbf {X})=\hbox {expit}(\varvec{\gamma }_0^{\tiny {t}}\mathbf {X})\), then \(\varvec{\gamma }_\infty =\varvec{\gamma }_0\), and so \(\pi _{\infty }(\mathbf {X})=\mathbb {P}(A=1\mid X)\). Also, if \(g(\mathbf {X})=\varvec{\beta }_0^{\tiny {t}}\mathbf {X}\), then \(\varvec{\beta }_\infty =\varvec{\beta }_0\) implying that \(g_{\infty }(\mathbf {X})=g(\mathbf {X})\). We can now invoke Theorem 2 to conclude the proof of the theorem. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sued, M., Valdora, M. & Yohai, V. Robust doubly protected estimators for quantiles with missing data. TEST 29, 819–843 (2020). https://doi.org/10.1007/s11749-019-00689-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-019-00689-9

Keywords

Mathematics Subject Classification

Navigation