Skip to main content
Log in

A simple approach to construct confidence bands for a regression function with incomplete data

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

A long-standing problem in the construction of asymptotically correct confidence bands for a regression function \(m(x)=E[Y|X=x]\), where Y is the response variable influenced by the covariate X, involves the situation where Y values may be missing at random, and where the selection probability, the density function f(x) of X, and the conditional variance of Y given X are all completely unknown. This can be particularly more complicated in nonparametric situations. In this paper, we propose a new kernel-type regression estimator and study the limiting distribution of the properly normalized versions of the maximal deviation of the proposed estimator from the true regression curve. The resulting limiting distribution will be used to construct uniform confidence bands for the underlying regression curve with asymptotically correct coverages. The focus of the current paper is on the case where \(X\in \mathbb {R}\). We also perform numerical studies to assess the finite-sample performance of the proposed method. In this paper, both mechanics and the theoretical validity of our methods are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Burke, M.: A Gaussian bootstrap approach to estimation and tests in Asymptotic Methods in Probability and Statistics. E. (eds.) B. Szyszkowicz, pp. 697-706. North-Holland, Amsterdam (1998)

    MATH  Google Scholar 

  • Burke, M.: Multivariate tests-of-fit and uniform confidence bands using a weighted bootstrap. Stat. Probab. Lett. 46, 13–20 (2000)

    MathSciNet  MATH  Google Scholar 

  • Cai, T., Low, M., Zongming, M.: Adaptive confidence bands for nonparametric regression functions. J. Am. Stat. Assoc. 109, 1054–1070 (2014)

    MathSciNet  MATH  Google Scholar 

  • Claeskens, G., Van Keilegom, I.: Bootstrap confidence bands for regression curves and their derivatives. Ann. Stat. 31, 1852–1884 (2003)

    MathSciNet  MATH  Google Scholar 

  • Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)

    MATH  Google Scholar 

  • Eubank, R.L., Speckman, P.L.: Confidence bands in nonparametric regression. J. Am. Stat. Assoc. 88(424), 1287–1301 (1993)

    MathSciNet  MATH  Google Scholar 

  • Gu, L., Yang, L.: Oracally efficient estimation for single-index link function with simultaneous confidence band. Electron. J. Stat. 9, 1540–1561 (2015)

    MathSciNet  MATH  Google Scholar 

  • Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer, New York (2002)

    MATH  Google Scholar 

  • Härdle, W.: Asymptotic maximal deviation of M-smoothers. J. Multivar. Anal. 29, 163–179 (1989)

    MathSciNet  MATH  Google Scholar 

  • Härdle, W.: Applied Nonparametric Regression. Cambridge University Press, Cambridge (1990)

    MATH  Google Scholar 

  • Härdle, W., Song, S.: Confidence bands in quantile regression. Econom. Theory 26(4), 1–22 (2010)

    MathSciNet  MATH  Google Scholar 

  • Hayfield, T., Racine, J.: Nonparametric econometrics: the np package. J. Stat. Softw. 27(5), 1–32 (2008)

    Google Scholar 

  • Hollander, M., McKeague, I.W., Yang, J.: Likelihood ratio-based confidence bands for survival functions. J. Am. Stat. Assoc. 92, 215–227 (1997)

    MathSciNet  MATH  Google Scholar 

  • Horváth, L.: Approximations for hybrids of empirical and partial sums processes. J. Stat. Plan. Inference 88, 1–18 (2000)

    MathSciNet  MATH  Google Scholar 

  • Horváth, L., Kokoszka, P., Steinebach, J.: Approximations for weighted bootstrap processes with an application. Stat. Probab. Lett. 48, 59–70 (2000)

    MathSciNet  MATH  Google Scholar 

  • Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)

    MathSciNet  MATH  Google Scholar 

  • Janssen, A.: Resampling student’s t-type statistics. Ann. Inst. Stat. Math. 57, 507–529 (2005)

    MathSciNet  MATH  Google Scholar 

  • Janssen, A., Pauls, T.: How do bootstrap and permutation tests work? Ann. Stat. 31, 768–806 (2003)

    MathSciNet  MATH  Google Scholar 

  • Johnston, G.J.: Probabilities of maximal deviations for nonparametric regression function estimates. J. Multivar. Anal. 12, 402–414 (1982)

    MathSciNet  MATH  Google Scholar 

  • Kojadinovic, I., Yan, J.: Goodness-of-fit testing based on a weighted bootstrap: a fast large-sample alternative to the parametric bootstrap. Can. J. Stat. 40, 480–500 (2012)

    MathSciNet  MATH  Google Scholar 

  • Kojadinovic, I., Yan, J., Holmes, M.: Fast large-sample goodness-of-fit for copulas. Stat. Sinica 21, 841–871 (2011)

    MathSciNet  MATH  Google Scholar 

  • Konakov, V.D., Piterbarg, V.I.: On the convergence rate of maximal deviation distribution. J. Multivar. Anal. 15, 279–294 (1984)

    MATH  Google Scholar 

  • Liero, H.: On the maximal deviation of the kernel regression function estimate. Ser. Stat. 13, 171–182 (1982)

    MathSciNet  MATH  Google Scholar 

  • Lei, Q., Qin, Y.: Confidence intervals for nonparametric regression functions with missing data: multiple design case. J. Syst. Sci. Complex. 24, 1204–1217 (2011)

    MathSciNet  MATH  Google Scholar 

  • Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)

    MATH  Google Scholar 

  • Li, G., van Keilegom, I.: Likelihood ratio confidence bands in nonparametric regression with censored data. Scand. J. Stat. 2, 547–562 (2002)

    Google Scholar 

  • Mack, Y.P., Silverman, Z.: Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Gebiete 61, 405–415 (1982)

    MathSciNet  MATH  Google Scholar 

  • Mason, D.M., Newton, M.A.: A rank statistics approach to the consistency of a general bootstrap. Ann. Stat. 20, 1611–1624 (1992)

    MathSciNet  MATH  Google Scholar 

  • Massé, P., Meiniel, W.: Adaptive confidence bands in the nonparametric fixed design regression model. J. Nonparameter Stat. 26, 451–469 (2014)

    MathSciNet  MATH  Google Scholar 

  • Mojirsheibani, M., Pouliot, W.: Weighted bootstrapped kernel density estimators in two sample problems. J. Nonparameter Stat. 29, 61–84 (2017)

    MathSciNet  MATH  Google Scholar 

  • Mondal, S., Subramanian, S.: Simultaneous confidence bands for Cox regression from semiparametric random censorship. Lifetime Data Anal. 22, 122–144 (2016)

    MathSciNet  MATH  Google Scholar 

  • Muminov, M.S.: On the limit distribution of the maximum deviation of the empirical distribution density and the regression function. I. Theory Probab. Appl. 55, 509–517 (2011)

    MathSciNet  MATH  Google Scholar 

  • Muminov, M.S.: On the limit distribution of the maximum deviation of the empirical distribution density and the regression function II. Theory Probab. Appl. 56, 155–166 (2012)

    MathSciNet  MATH  Google Scholar 

  • Nadaraya, E.A.: Remarks on nonparametric estimates for density functions and regression curves. Theory Probab. Appl. 15, 134–137 (1970)

    MATH  Google Scholar 

  • Neumann, M.H., Polzehl, J.: Simultaneous bootstrap confidence bands in nonparametric regression. J. Nonparameter Stat. 9, 307–333 (1998)

    MathSciNet  MATH  Google Scholar 

  • Praestgaard, J., Wellner, J.A.: Exchangeably weighted bootstraps of the general empirical process. Ann. Probab. 21, 2053–2086 (1993)

    MathSciNet  MATH  Google Scholar 

  • Proksch, K.: On confidence bands for multivariate nonparametric regression. Ann. Inst. Stat. Math. 68, 209–236 (2016)

    MathSciNet  MATH  Google Scholar 

  • Qin, Y., Qiu, T., Lei, Q.: Confidence intervals for nonparametric regression functions with missing data. Commun. Stat. Theory Methods 43, 4123–4142 (2014)

    MathSciNet  MATH  Google Scholar 

  • Racine, J., Li, Q.: Cross-validated local linear nonparametric regression. Stat. Sinica 14, 485–512 (2004)

    MathSciNet  MATH  Google Scholar 

  • Song, S., Ritov, Y., Härdle, W.: Bootstrap confidence bands and partial linear quantile regression. J. Multivar. Anal. 107, 244–262 (2012)

    MathSciNet  MATH  Google Scholar 

  • Wandl, H.: On kernel estimation of regression functions. Wissenschaftliche Sitzungen zur Stochastik, WSS-03, Berlin. (1980)

  • Wang, Q., Shen, J.: Estimation and confidence bands of a conditional survival function with censoring indicators missing at random. J. Multivar. Anal. 99, 928–948 (2008)

    MathSciNet  MATH  Google Scholar 

  • Wang, Q., Qin, Y.: Empirical likelihood confidence bands for distribution functions with missing responses. J. Stat. Plann. Inference 140, 2778–2789 (2010)

    MathSciNet  MATH  Google Scholar 

  • Watson, G.S.: Smooth regression analysis. Sankhya Ser. A 26, 359–372 (1964)

    MathSciNet  MATH  Google Scholar 

  • Xia, Y.: Bias-corrected confidence bands in nonparametric regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60, 797–811 (1998)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majid Mojirsheibani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the NSF Grant DMS-1407400 of Majid Mojirsheibani.

Appendix: Proof of theorem 2

Appendix: Proof of theorem 2

We prove Theorem 2 in a number of steps.

STEP 1. Let \(\widehat{m}_n(x)\) and \(\widetilde{m}_n(x)\) be as in (12) and (11), respectively. Then, as \(n\rightarrow \infty \), we have

$$\begin{aligned} \sqrt{n h_n \log n}\, \sup _{x\in [0,1]} \,\left| \widehat{m}_n(x) - \widetilde{m}_n(x)\right| \longrightarrow ^p 0. \end{aligned}$$
(18)

To show this, put \(r_n(x)=\sum _{i=1}^n |\Delta _i Y_i|\, K((x-X_i)/h_n) / \sum _{i=1}^n K((x-X_i)/h_n)\) and observe that in view of assumption (E)

$$\begin{aligned} \left| \widehat{m}_n(x) - \widetilde{m}_n(x)\right|= & {} \left| \sum _{i=1}^n \left[ \Delta _i Y_i\,\left( \frac{1}{\widehat{p}_n( X_i)} -\frac{1}{p(X_i)}\right) K\left( \frac{x-X_i}{h_n}\right) \right] \right. \\&\left. \div \sum _{i=1}^n K\left( \frac{x-X_i}{h_n}\right) \right| \\\le & {} \max _{1\le i \le n}\left[ \left| \frac{\widehat{p}_n(X_i)-p( X_i)}{\widehat{p}_n(X_i)p(X_i)}\right| \, \mathbb {I}_{\big \{X_i\in [x-Ah_n,\,\, x+Ah_n]\big \}}\right] \cdot r_n(x), \end{aligned}$$

where \(\mathbb {I}_B\) denotes the indicator of a set B. But, \(r_n(x)\le \max (|B_1|, |B_2|)\), by the second part of assumption (A). Furthermore, in view of assumptions (A), (B), (C), \((\hbox {E}^{'})\), and Theorem B of Mack and Silverman (1982) one has \(\sup _{x\in [0,1]} |\widehat{p}_n(x)-p(x)|=O_p\big (\sqrt{\log n/(n\lambda _n)}\,\big )\). Therefore,

$$\begin{aligned} \sup _{x\in [0,1]}\left| \widehat{m}_n(x) - \widetilde{m}_n(x)\right|\le & {} \sup _{x\in [0,1]} \max _{1\le i \le n}\left[ \left| \frac{\widehat{p}_n(X_i)-p( X_i)}{\widehat{p}_n(X_i)p(X_i)}\right| \mathbb {I}_{\big \{X_i\in [x-Ah_n,\,\, x+Ah_n]\big \}}\right] \nonumber \\&\cdot r_n(x)\nonumber \\\le & {} \frac{1}{p_0} \max _{1\le i \le n}\left| \frac{\widehat{p}_n(X_i)-p( X_i)}{\widehat{p}_n(X_i)} \right| \mathbb {I}_{\big \{X_i\in [-Ah_n,\,\, 1+Ah_n]\big \}}\nonumber \\&\cdot \sup _{x\in [0,1]}r_n(x)\nonumber \\\le & {} \frac{1}{p_0}\, \sup _{-A h_n\le x\le 1+Ah_n} \left| \frac{\widehat{p}_n(x)- p(x)}{\widehat{p}_n(x)}\right| \cdot \max (|B_1|, |B_2|)\nonumber \\\le & {} \frac{1}{p_0^2}\, O_p\left( \sqrt{\frac{\log n}{n\lambda _n}}\right) \cdot \max (|B_1|, |B_2|) \, =\,O_p\left( \sqrt{\frac{\log n}{n\lambda _n}}\right) \nonumber \\&(\text{ since } \lim \nolimits _{n\rightarrow \infty }\mathbb {I}_{B_n}=\mathbb {I}_{\lim _{n\rightarrow \infty B_n}} \text{ for } \text{ monotone } \text{ sets } B_n), \end{aligned}$$
(19)

where we have used the fact that \(p_0 \le \lim _{n\rightarrow \infty }\inf _{x}\, \widehat{p}_n(x) \le 1\) (which follows by noticing that \(-\sup _{x} |\widehat{p}_n(x)-p(x)|+p_0 \le \inf _x \widehat{p}_n(x) \le \sup _x |\widehat{p}_n(x)-p(x)|+1\) and then taking the limit, as \(n\rightarrow \infty \)); here, the infimums are taken over the set \([-Ah_n \,, \, 1+A h_n]\). Now, (18) follows from (19) together with the fact that \((\log n)^2 h_n/\lambda _n = (\log n)^2 \,n^{\beta -\delta }\rightarrow 0\), as \(n\rightarrow \infty \) (because \(\beta <\delta \)).

STEP 2. Define the quantity

$$\begin{aligned} \widetilde{\sigma }_n^2= & {} \left\{ \sum _{i=1}^n \, \left[ \frac{\Delta _i Y_i}{p(X_i)}+\epsilon _i\right] ^2\, K\left( \frac{x-X_i}{h_n}\right) \, \div \,\sum _{i=1}^n K\left( \frac{x-X_i}{h_n}\right) \right\} - \left( \widetilde{m}_n(x)\right) ^2\,, ~~~~~~~~~ \end{aligned}$$
(20)

where \(\widetilde{m}_n(x)\) is as in (11), and put

$$\begin{aligned} \sigma ^2(x):= & {} E\left[ \left( \frac{\Delta Y}{p(X)}+\epsilon \right) ^2 \bigg |\, X=x\right] -\left( E\left[ \frac{\Delta Y}{p(X)}+ \epsilon \,\bigg |\, X=x \right] \right) ^2 \nonumber \\= & {} E\left[ \left( \frac{\Delta Y}{p(X)}+\epsilon \right) ^2 \bigg |\, X=x\right] \nonumber \\&-\,m^2(x), ~~(\text{ by } (2), (10), \hbox { and assumption } (\hbox {G})).~~~~~~ \end{aligned}$$
(21)

Also, let \(\widehat{\sigma }^2_{\widehat{p}_n}(x)\) be as in (15). Then, we have

$$\begin{aligned} \sup _{x\in [0,1]}\left| \widehat{\sigma }^2_{\widehat{p}_n}(x)-\widetilde{\sigma }^2_n(x)\right|= & {} O_p\left( \sqrt{\frac{\log n}{n\lambda _n}}\,\right) + o_p\left( \frac{1}{\sqrt{n h_n \log n}}\right) , \end{aligned}$$
(22)
$$\begin{aligned} \sup _{x\in [0,1]} \left| \widetilde{\sigma }^2_n(x)-\sigma ^2(x)\right|= & {} O_p\left( \sqrt{\frac{\log n}{n h_n}}\,\right) , \end{aligned}$$
(23)

where \(\sigma ^2(x)\) is as in (21). To establish (22) and (23), first observe that

$$\begin{aligned} \left| \widehat{\sigma }^2_{\widehat{p}_n}(x)-\widetilde{\sigma }^2_n(x) \right|\le & {} \left| \left[ \sum _{i=1}^n\left( \frac{1}{\widehat{p}^{\,2}_n(X_i)}-\frac{1}{p^2(X_i)}\right) \Delta _i Y_i^2 \cdot K\left( \frac{x-X_i}{h_n}\right) \right] \right. \\&\left. \div \,\sum _{i=1}^n K\left( \frac{x-X_i}{h_n}\right) \right| \\&+\,2 \left| \left[ \sum _{i=1}^n\left( \frac{1}{\widehat{p}_n(X_i)}-\frac{1}{p(X_i)}\right) \epsilon _i \, \Delta _i Y_i\cdot K\left( \frac{x-X_i}{h_n}\right) \right] \right. \\&\left. \div \, \sum _{i=1}^n K\left( \frac{x-X_i}{h_n}\right) \right| \\&+\,\left| \Big (\widehat{m}_n(x)-\widetilde{m}_n(x)\Big ) \Big (\widehat{m}_n(x)+\widetilde{m}_n(x)\Big )\right| \\:= & {} \big |V_{n,1}(x)\big |+ 2\big |V_{n,2}(x)\big | + \big |V_{n,3}(x)\big |. \end{aligned}$$

But, by the second part of assumption (A), the second part of assumption (F), and the boundedness of the support of the distribution of \(\epsilon \), we immediately find \(\sup _{x\in [0,1]} |\widetilde{m}_n(x)|= O_p(1)\). Thus, by (18), \(\sup _{x\in [0,1]} |\widehat{m}_n(x)| \le \sup _{x\in [0,1]} |\widehat{m}_n(x)-\widetilde{m}_n(x)| + \sup _{x\in [0,1]} |\widetilde{m}_n(x)|= o_p\left( (n h_n \log n)^{-1/2}\right) +O_p(1)=O_p(1)\). Therefore, by (18),

$$\begin{aligned} \sup _{x\in [0,1]}\big |V_{n,3}(x)\big | \le \sup _{x\in [0,1]} |\widehat{m}_n(x)-\widetilde{m}_n(x)|\cdot O_p(1) = o_p\left( (n h_n \log n)^{-1/2}\right) . \end{aligned}$$

Next, observe that

$$\begin{aligned}&\sup _{x\in [0,1]}\max _{1\le i \le n}\left| \frac{\widehat{p}^2_n(X_i)-p^2( X_i)}{\widehat{p}^2_n(X_i)p^2(X_i)}\right| \mathbb {I}_{\big \{X_i\in [x-Ah_n,\,\, x+Ah_n]\big \}} \\&\quad \le \,\frac{2}{p^2_0}\sup _{-A h_n\le x\le 1+A h_n} \left| \frac{\widehat{p}_n(x)- p(x)}{\widehat{p}^2_n(x)}\right| \\&\quad = O_p\big (\sqrt{\log n/(n\lambda _n)}\,\big ), \end{aligned}$$

which follows from the last part of assumption (F), the fact that \(p_0^2 \le \lim _{n\rightarrow \infty } \inf _{x} \,\widehat{p}^2(x) \le \sup _x \,p^2(x) =1\), together with Theorem B of Mack and Silverman (1982), where the infimum is taken over the set \([-Ah_n \,, \, 1+A h_n]\). Therefore,

$$\begin{aligned} \sup _{x\in [0,1]}\big |V_{n,1}(x)\big |\le & {} \sup _{x\in [0,1]} \left| \frac{\sum _{i=1}^n \Delta _i Y_i^2 \cdot K\left( \frac{x-X_i}{h_n}\right) }{\sum _{i=1}^n K\left( \frac{x-X_i}{h_n}\right) }\right| \\&\times \frac{2}{p^2_0}\sup _{-A h_n\le x\le 1+A h_n} \left| \frac{\widehat{p}_n(x)- p(x)}{\widehat{p}^2_n(x)}\right| \\= & {} O_p\big (\sqrt{\log n/(n\lambda _n)}\,\big ), \end{aligned}$$

which follows because, in view of assumption (A), the first supremum term on the right side of the above inequality is bounded by \(\max (B_1^2 , B_2^2)\). Similarly, we have \(\sup _{x\in [0,1]}|V_{n,2}(x)| = O_p(\sqrt{\log n /(n\lambda _n)}\,)\), from which (22) follows. The proof of (23) is rather straightforward and goes as follows:

$$\begin{aligned} \sup _{x\in [0,1]} \left| \widetilde{\sigma }^2_n(x)-\sigma ^2(x)\right|\le & {} \sup _{x\in [0,1]} \left| \frac{\sum _{i=1}^n \, \left[ \frac{\Delta _i Y_i}{p(X_i)}+\epsilon _i\right] ^2\, K\left( \frac{x-X_i}{h_n}\right) }{\sum _{i=1}^n K\left( \frac{x-X_i}{h_n}\right) } \right. \\&\left. -\,E\left[ \left( \frac{\Delta Y}{p(X)}+\epsilon \right) ^2 \bigg |\, X=x\right] \right| \\&+\,\Big |\big (\widetilde{m}_n(x)-m(x)\big ) \big (\widetilde{m}_n(x)+m(x)\big )\Big |\\= & {} O_p\left( \sqrt{\frac{\log n}{n h_n}}\,\right) + O_p\left( \sqrt{\frac{\log n}{n h_n}}\,\right) \,=\, O_p\left( \sqrt{\frac{\log n}{n h_n}}\,\right) . \end{aligned}$$

STEP 3. Let \(\widehat{f}_n(x)\), \(\widehat{m}_n(x)\), \(\widetilde{m}_n(x)\), \(\widehat{\sigma }^2_{\widehat{p}_n}(x)\), and \(\widetilde{\sigma }^2_n(x)\) be as in (3), (12), (11), (15), and (20), respectively, and write

$$\begin{aligned} \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widehat{\sigma }^2_{\widehat{p}_n}(x)}}\, \Big |\widehat{m}_n(x)-m(x)\Big | = \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widetilde{\sigma }^2_n(x)}}\, \Big |\widetilde{m}_n(x)-m(x)\Big | + R_n\,, \end{aligned}$$
(24)

where

$$\begin{aligned} R_n= & {} \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widehat{\sigma }^2_{\widehat{p}_n}(x)}}\, \Big |\widehat{m}_n(x)-m(x)\Big | - \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widetilde{\sigma }^2_n(x)}}\, \Big |\widetilde{m}_n(x)-m(x)\Big | \nonumber \\\le & {} \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widehat{\sigma }^2_{\widehat{p}_n}(x)}}\, \Big |\widehat{m}_n(x)-\widetilde{m}(x)\Big | \nonumber \\&+\,\left( \sup _{x\in [0,1]} \sqrt{\frac{\widetilde{\sigma }^2_n(x)}{\widehat{\sigma }^2_{\widehat{p}_n}(x)}}-1\right) \cdot \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widetilde{\sigma }^2_n(x)}}\, \Big |\widetilde{m}_n(x)-m(x)\Big | \nonumber \\:= & {} R_n(1) + R_n(2). \end{aligned}$$
(25)

To deal with the supremum on the r.h.s of (24), we note that \(\widetilde{m}_n(x)\) and \(\widetilde{\sigma }_n^2(x)\) that appear in this supremum term are, respectively, the kernel regression estimator of \(E(Y^*|X=x)\) and the kernel estimator of the conditional variance of \(Y^*\), as given by (21), based on the iid “data” \((X_i, Y^*_i),\)\(i=1,\dots ,n\), where \(Y^*\) is given by (14). It is straightforward to see that when assumptions (A) and (F) hold, \(P\{B^*_1\le Y^* \le B^*_2\}=1\), where \(B_1^*=\min (0,B_1)+a_0\)  and \(B_2^*=\frac{B_2}{p_0}+b_0\), with the constants \(B_1\) and \(B_2\) as in assumption (A), and where \(a_0\) and \(b_0\) are as in assumption (G). Also, in view of assumptions (A) and (G), the random vector \((X, Y^*)\) has a pdf. Therefore, when assumption (A) holds for the distribution of (XY), because of assumption (F), it also holds for the distribution of \((X,Y^*)\). Similarly, if \(\sigma _0^2(x)\) satisfies assumption (C), then so does \(\sigma ^2(x)\) (in view of assumption (F)); to show this, simply observe that in view of (2) we have \(\sigma ^2(x){\mathop {=}\limits ^{ \text{ via } (21)}}[(p(x))^{-1}-1] E(Y^2|X=x)+E(\epsilon ^2)+\sigma _0^2(x)\). Therefore, as a consequence of Theorem 1, under assumptions (A), (B), (C), \((\hbox {D}^{'})\), (E), (F), and (G),

$$\begin{aligned} \sqrt{2 \delta \log n}\,\left\{ \sqrt{\frac{n h_n}{c_K}}\, \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widetilde{\sigma }^2_n(x)}}\, \Big |\widetilde{m}_n(x)-m(x)\Big |- d_n\right\} \longrightarrow ^{d}\, Y\,, \end{aligned}$$
(26)

where \(P(Y\le y)=\exp \left\{ -2 \exp (-y)\right\} \), \(y\in \mathbb {R}\), \(c_K=\int K^2(t)\,\mathrm{d}t\), and \(d_n\) is as in (5). Therefore, to prove Theorem 2, it is sufficient to show that \(\sqrt{n h_n \log n}\,\big (R_n(1)+R_n(2)\big )\rightarrow ^p 0\), as \(n\rightarrow \infty \). First, we show that \(\sqrt{n h_n \log n}\,|R_n(2)|\rightarrow ^p 0\). To show this, observe that by (26)

$$\begin{aligned} \sup _{x\in [0,1]} \sqrt{\frac{\widehat{f}_n(x)}{\widetilde{\sigma }^2_n(x)}}\, \Big |\widetilde{m}_n(x)-m(x)\Big | = O_p\left( \sqrt{\log n/(n h_n)}\right) . \end{aligned}$$
(27)

Furthermore, \( \big |\sup _x\sqrt{\widetilde{\sigma }^2_n(x)/\widehat{\sigma }^2_{\widehat{p}_n}(x)}-1\big | \le \sup _{x}\big |\sqrt{\widetilde{\sigma }^2_n(x)/\widehat{\sigma }^2_{\widehat{p}_n}(x)}- 1\big | \le \frac{\sup _{x}\big |\widetilde{\sigma }^2_n(x)- \widehat{\sigma }^2_{\widehat{p}_n}(x)\big |}{\inf _{x} \widehat{\sigma }^2_{\widehat{p}_n}(x)}.\) But by (22), \(\sup _{x\in [0,1]}\left| \widehat{\sigma }^2_{\widehat{p}_n}(x)-\widetilde{\sigma }^2_n(x)\right| = O_p\left( \sqrt{\log n/(n \lambda _n)}\right) + o_p\left( (n h_n \log n)^{-1/2}\right) \). We also note that \(\,\inf _x\, \widehat{\sigma }^2_{\widehat{p}_n}(x) \ge \inf _x\, \left\{ \widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\right\} +\inf _x\, \sigma ^2(x) \ge - \sup _x \big |\widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\big | +\inf _x\,\sigma ^2(x) \), where \(\sigma ^2(x)\) is as in (21). Similarly, observe that \(\inf _x\,\widehat{\sigma }_n^2(x) \le \sup _x \big |\widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\big | +\sup _x\,\sigma ^2(x)\). Thus, we have

$$\begin{aligned} - \sup _x \big |\widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\big | +\inf _x\,\sigma ^2(x) \,\le \inf _x\,\widehat{\sigma }_n^2(x) \le \, \sup _x \big |\widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\big | +\sup _x\,\sigma ^2(x). \end{aligned}$$

Now, in view of (22) and (23), and upon taking the limit in the above chain of inequalities, as \(n\rightarrow \infty \), we find \(0< \lim _{n\rightarrow \infty } \inf _x \, \widehat{\sigma }_n^2(x) <\infty \), which yields

$$\begin{aligned} \bigg |\sup _{x\in [0,1]}\sqrt{\widetilde{\sigma }^2_n(x)/\widehat{\sigma }^2_{\widehat{p}_n}(x)}-1\bigg | = O_p\left( \sqrt{\log n/(n \lambda _n)}\right) + o_p\left( (n h_n \log n)^{-1/2}\right) . \end{aligned}$$

This in conjunction with (27) implies that \(\sqrt{n h_n \log n}\,|R_n(2)|\rightarrow ^p 0\), where \(R_n(2)\) is as in (25). Next, observe that \(\sup _x \big |\widehat{f}_n(x)/\widehat{\sigma }_n^2(x)\big | \le \big [\sup _x \big |\widehat{f}_n(x)-f(x)\big |+\sup _x f(x)\big ]/\inf _x \widehat{\sigma }^2_{\widehat{p}_n}(x) = O_p(1)\), which follows because \(\sup _x|\widehat{f}_n(x)-f(x)|=o_p(1)\) and by the fact that \(0< \lim _{n\rightarrow \infty } \inf _x \, \widehat{\sigma }_n^2(x) <\infty \) (as shown above). Combining these results, we have

$$\begin{aligned} \sqrt{n h_n \log n}\,|R_n(1)|\le & {} \, \sqrt{\sup _{x\in [0,1]}\left| \widehat{f}_n(x)/\widehat{\sigma }^2_{\widehat{p}_n}(x)\right| }\,\cdot \sup _{x\in [0,1]} \Big |\widehat{m}_n(x)-\widetilde{m}(x)\Big | \\= & {} O_p(1) \cdot o_p(1) \, = \, o_p(1). \end{aligned}$$

Putting the above results together, we have \(\sqrt{n h_n \log n}\,|R_n|=o_p(1)\). Theorem 2 now follows from this together with (24), (25), and (26). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Sharadqah, A., Mojirsheibani, M. A simple approach to construct confidence bands for a regression function with incomplete data. AStA Adv Stat Anal 104, 81–99 (2020). https://doi.org/10.1007/s10182-019-00351-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-019-00351-7

Keywords

Navigation