Abstract
A long-standing problem in the construction of asymptotically correct confidence bands for a regression function \(m(x)=E[Y|X=x]\), where Y is the response variable influenced by the covariate X, involves the situation where Y values may be missing at random, and where the selection probability, the density function f(x) of X, and the conditional variance of Y given X are all completely unknown. This can be particularly more complicated in nonparametric situations. In this paper, we propose a new kernel-type regression estimator and study the limiting distribution of the properly normalized versions of the maximal deviation of the proposed estimator from the true regression curve. The resulting limiting distribution will be used to construct uniform confidence bands for the underlying regression curve with asymptotically correct coverages. The focus of the current paper is on the case where \(X\in \mathbb {R}\). We also perform numerical studies to assess the finite-sample performance of the proposed method. In this paper, both mechanics and the theoretical validity of our methods are discussed.
Similar content being viewed by others
References
Burke, M.: A Gaussian bootstrap approach to estimation and tests in Asymptotic Methods in Probability and Statistics. E. (eds.) B. Szyszkowicz, pp. 697-706. North-Holland, Amsterdam (1998)
Burke, M.: Multivariate tests-of-fit and uniform confidence bands using a weighted bootstrap. Stat. Probab. Lett. 46, 13–20 (2000)
Cai, T., Low, M., Zongming, M.: Adaptive confidence bands for nonparametric regression functions. J. Am. Stat. Assoc. 109, 1054–1070 (2014)
Claeskens, G., Van Keilegom, I.: Bootstrap confidence bands for regression curves and their derivatives. Ann. Stat. 31, 1852–1884 (2003)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
Eubank, R.L., Speckman, P.L.: Confidence bands in nonparametric regression. J. Am. Stat. Assoc. 88(424), 1287–1301 (1993)
Gu, L., Yang, L.: Oracally efficient estimation for single-index link function with simultaneous confidence band. Electron. J. Stat. 9, 1540–1561 (2015)
Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer, New York (2002)
Härdle, W.: Asymptotic maximal deviation of M-smoothers. J. Multivar. Anal. 29, 163–179 (1989)
Härdle, W.: Applied Nonparametric Regression. Cambridge University Press, Cambridge (1990)
Härdle, W., Song, S.: Confidence bands in quantile regression. Econom. Theory 26(4), 1–22 (2010)
Hayfield, T., Racine, J.: Nonparametric econometrics: the np package. J. Stat. Softw. 27(5), 1–32 (2008)
Hollander, M., McKeague, I.W., Yang, J.: Likelihood ratio-based confidence bands for survival functions. J. Am. Stat. Assoc. 92, 215–227 (1997)
Horváth, L.: Approximations for hybrids of empirical and partial sums processes. J. Stat. Plan. Inference 88, 1–18 (2000)
Horváth, L., Kokoszka, P., Steinebach, J.: Approximations for weighted bootstrap processes with an application. Stat. Probab. Lett. 48, 59–70 (2000)
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)
Janssen, A.: Resampling student’s t-type statistics. Ann. Inst. Stat. Math. 57, 507–529 (2005)
Janssen, A., Pauls, T.: How do bootstrap and permutation tests work? Ann. Stat. 31, 768–806 (2003)
Johnston, G.J.: Probabilities of maximal deviations for nonparametric regression function estimates. J. Multivar. Anal. 12, 402–414 (1982)
Kojadinovic, I., Yan, J.: Goodness-of-fit testing based on a weighted bootstrap: a fast large-sample alternative to the parametric bootstrap. Can. J. Stat. 40, 480–500 (2012)
Kojadinovic, I., Yan, J., Holmes, M.: Fast large-sample goodness-of-fit for copulas. Stat. Sinica 21, 841–871 (2011)
Konakov, V.D., Piterbarg, V.I.: On the convergence rate of maximal deviation distribution. J. Multivar. Anal. 15, 279–294 (1984)
Liero, H.: On the maximal deviation of the kernel regression function estimate. Ser. Stat. 13, 171–182 (1982)
Lei, Q., Qin, Y.: Confidence intervals for nonparametric regression functions with missing data: multiple design case. J. Syst. Sci. Complex. 24, 1204–1217 (2011)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)
Li, G., van Keilegom, I.: Likelihood ratio confidence bands in nonparametric regression with censored data. Scand. J. Stat. 2, 547–562 (2002)
Mack, Y.P., Silverman, Z.: Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Gebiete 61, 405–415 (1982)
Mason, D.M., Newton, M.A.: A rank statistics approach to the consistency of a general bootstrap. Ann. Stat. 20, 1611–1624 (1992)
Massé, P., Meiniel, W.: Adaptive confidence bands in the nonparametric fixed design regression model. J. Nonparameter Stat. 26, 451–469 (2014)
Mojirsheibani, M., Pouliot, W.: Weighted bootstrapped kernel density estimators in two sample problems. J. Nonparameter Stat. 29, 61–84 (2017)
Mondal, S., Subramanian, S.: Simultaneous confidence bands for Cox regression from semiparametric random censorship. Lifetime Data Anal. 22, 122–144 (2016)
Muminov, M.S.: On the limit distribution of the maximum deviation of the empirical distribution density and the regression function. I. Theory Probab. Appl. 55, 509–517 (2011)
Muminov, M.S.: On the limit distribution of the maximum deviation of the empirical distribution density and the regression function II. Theory Probab. Appl. 56, 155–166 (2012)
Nadaraya, E.A.: Remarks on nonparametric estimates for density functions and regression curves. Theory Probab. Appl. 15, 134–137 (1970)
Neumann, M.H., Polzehl, J.: Simultaneous bootstrap confidence bands in nonparametric regression. J. Nonparameter Stat. 9, 307–333 (1998)
Praestgaard, J., Wellner, J.A.: Exchangeably weighted bootstraps of the general empirical process. Ann. Probab. 21, 2053–2086 (1993)
Proksch, K.: On confidence bands for multivariate nonparametric regression. Ann. Inst. Stat. Math. 68, 209–236 (2016)
Qin, Y., Qiu, T., Lei, Q.: Confidence intervals for nonparametric regression functions with missing data. Commun. Stat. Theory Methods 43, 4123–4142 (2014)
Racine, J., Li, Q.: Cross-validated local linear nonparametric regression. Stat. Sinica 14, 485–512 (2004)
Song, S., Ritov, Y., Härdle, W.: Bootstrap confidence bands and partial linear quantile regression. J. Multivar. Anal. 107, 244–262 (2012)
Wandl, H.: On kernel estimation of regression functions. Wissenschaftliche Sitzungen zur Stochastik, WSS-03, Berlin. (1980)
Wang, Q., Shen, J.: Estimation and confidence bands of a conditional survival function with censoring indicators missing at random. J. Multivar. Anal. 99, 928–948 (2008)
Wang, Q., Qin, Y.: Empirical likelihood confidence bands for distribution functions with missing responses. J. Stat. Plann. Inference 140, 2778–2789 (2010)
Watson, G.S.: Smooth regression analysis. Sankhya Ser. A 26, 359–372 (1964)
Xia, Y.: Bias-corrected confidence bands in nonparametric regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60, 797–811 (1998)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the NSF Grant DMS-1407400 of Majid Mojirsheibani.
Appendix: Proof of theorem 2
Appendix: Proof of theorem 2
We prove Theorem 2 in a number of steps.
STEP 1. Let \(\widehat{m}_n(x)\) and \(\widetilde{m}_n(x)\) be as in (12) and (11), respectively. Then, as \(n\rightarrow \infty \), we have
To show this, put \(r_n(x)=\sum _{i=1}^n |\Delta _i Y_i|\, K((x-X_i)/h_n) / \sum _{i=1}^n K((x-X_i)/h_n)\) and observe that in view of assumption (E)
where \(\mathbb {I}_B\) denotes the indicator of a set B. But, \(r_n(x)\le \max (|B_1|, |B_2|)\), by the second part of assumption (A). Furthermore, in view of assumptions (A), (B), (C), \((\hbox {E}^{'})\), and Theorem B of Mack and Silverman (1982) one has \(\sup _{x\in [0,1]} |\widehat{p}_n(x)-p(x)|=O_p\big (\sqrt{\log n/(n\lambda _n)}\,\big )\). Therefore,
where we have used the fact that \(p_0 \le \lim _{n\rightarrow \infty }\inf _{x}\, \widehat{p}_n(x) \le 1\) (which follows by noticing that \(-\sup _{x} |\widehat{p}_n(x)-p(x)|+p_0 \le \inf _x \widehat{p}_n(x) \le \sup _x |\widehat{p}_n(x)-p(x)|+1\) and then taking the limit, as \(n\rightarrow \infty \)); here, the infimums are taken over the set \([-Ah_n \,, \, 1+A h_n]\). Now, (18) follows from (19) together with the fact that \((\log n)^2 h_n/\lambda _n = (\log n)^2 \,n^{\beta -\delta }\rightarrow 0\), as \(n\rightarrow \infty \) (because \(\beta <\delta \)).
STEP 2. Define the quantity
where \(\widetilde{m}_n(x)\) is as in (11), and put
Also, let \(\widehat{\sigma }^2_{\widehat{p}_n}(x)\) be as in (15). Then, we have
where \(\sigma ^2(x)\) is as in (21). To establish (22) and (23), first observe that
But, by the second part of assumption (A), the second part of assumption (F), and the boundedness of the support of the distribution of \(\epsilon \), we immediately find \(\sup _{x\in [0,1]} |\widetilde{m}_n(x)|= O_p(1)\). Thus, by (18), \(\sup _{x\in [0,1]} |\widehat{m}_n(x)| \le \sup _{x\in [0,1]} |\widehat{m}_n(x)-\widetilde{m}_n(x)| + \sup _{x\in [0,1]} |\widetilde{m}_n(x)|= o_p\left( (n h_n \log n)^{-1/2}\right) +O_p(1)=O_p(1)\). Therefore, by (18),
Next, observe that
which follows from the last part of assumption (F), the fact that \(p_0^2 \le \lim _{n\rightarrow \infty } \inf _{x} \,\widehat{p}^2(x) \le \sup _x \,p^2(x) =1\), together with Theorem B of Mack and Silverman (1982), where the infimum is taken over the set \([-Ah_n \,, \, 1+A h_n]\). Therefore,
which follows because, in view of assumption (A), the first supremum term on the right side of the above inequality is bounded by \(\max (B_1^2 , B_2^2)\). Similarly, we have \(\sup _{x\in [0,1]}|V_{n,2}(x)| = O_p(\sqrt{\log n /(n\lambda _n)}\,)\), from which (22) follows. The proof of (23) is rather straightforward and goes as follows:
STEP 3. Let \(\widehat{f}_n(x)\), \(\widehat{m}_n(x)\), \(\widetilde{m}_n(x)\), \(\widehat{\sigma }^2_{\widehat{p}_n}(x)\), and \(\widetilde{\sigma }^2_n(x)\) be as in (3), (12), (11), (15), and (20), respectively, and write
where
To deal with the supremum on the r.h.s of (24), we note that \(\widetilde{m}_n(x)\) and \(\widetilde{\sigma }_n^2(x)\) that appear in this supremum term are, respectively, the kernel regression estimator of \(E(Y^*|X=x)\) and the kernel estimator of the conditional variance of \(Y^*\), as given by (21), based on the iid “data” \((X_i, Y^*_i),\)\(i=1,\dots ,n\), where \(Y^*\) is given by (14). It is straightforward to see that when assumptions (A) and (F) hold, \(P\{B^*_1\le Y^* \le B^*_2\}=1\), where \(B_1^*=\min (0,B_1)+a_0\) and \(B_2^*=\frac{B_2}{p_0}+b_0\), with the constants \(B_1\) and \(B_2\) as in assumption (A), and where \(a_0\) and \(b_0\) are as in assumption (G). Also, in view of assumptions (A) and (G), the random vector \((X, Y^*)\) has a pdf. Therefore, when assumption (A) holds for the distribution of (X, Y), because of assumption (F), it also holds for the distribution of \((X,Y^*)\). Similarly, if \(\sigma _0^2(x)\) satisfies assumption (C), then so does \(\sigma ^2(x)\) (in view of assumption (F)); to show this, simply observe that in view of (2) we have \(\sigma ^2(x){\mathop {=}\limits ^{ \text{ via } (21)}}[(p(x))^{-1}-1] E(Y^2|X=x)+E(\epsilon ^2)+\sigma _0^2(x)\). Therefore, as a consequence of Theorem 1, under assumptions (A), (B), (C), \((\hbox {D}^{'})\), (E), (F), and (G),
where \(P(Y\le y)=\exp \left\{ -2 \exp (-y)\right\} \), \(y\in \mathbb {R}\), \(c_K=\int K^2(t)\,\mathrm{d}t\), and \(d_n\) is as in (5). Therefore, to prove Theorem 2, it is sufficient to show that \(\sqrt{n h_n \log n}\,\big (R_n(1)+R_n(2)\big )\rightarrow ^p 0\), as \(n\rightarrow \infty \). First, we show that \(\sqrt{n h_n \log n}\,|R_n(2)|\rightarrow ^p 0\). To show this, observe that by (26)
Furthermore, \( \big |\sup _x\sqrt{\widetilde{\sigma }^2_n(x)/\widehat{\sigma }^2_{\widehat{p}_n}(x)}-1\big | \le \sup _{x}\big |\sqrt{\widetilde{\sigma }^2_n(x)/\widehat{\sigma }^2_{\widehat{p}_n}(x)}- 1\big | \le \frac{\sup _{x}\big |\widetilde{\sigma }^2_n(x)- \widehat{\sigma }^2_{\widehat{p}_n}(x)\big |}{\inf _{x} \widehat{\sigma }^2_{\widehat{p}_n}(x)}.\) But by (22), \(\sup _{x\in [0,1]}\left| \widehat{\sigma }^2_{\widehat{p}_n}(x)-\widetilde{\sigma }^2_n(x)\right| = O_p\left( \sqrt{\log n/(n \lambda _n)}\right) + o_p\left( (n h_n \log n)^{-1/2}\right) \). We also note that \(\,\inf _x\, \widehat{\sigma }^2_{\widehat{p}_n}(x) \ge \inf _x\, \left\{ \widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\right\} +\inf _x\, \sigma ^2(x) \ge - \sup _x \big |\widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\big | +\inf _x\,\sigma ^2(x) \), where \(\sigma ^2(x)\) is as in (21). Similarly, observe that \(\inf _x\,\widehat{\sigma }_n^2(x) \le \sup _x \big |\widehat{\sigma }^2_{\widehat{p}_n}(x)-\sigma ^2(x)\big | +\sup _x\,\sigma ^2(x)\). Thus, we have
Now, in view of (22) and (23), and upon taking the limit in the above chain of inequalities, as \(n\rightarrow \infty \), we find \(0< \lim _{n\rightarrow \infty } \inf _x \, \widehat{\sigma }_n^2(x) <\infty \), which yields
This in conjunction with (27) implies that \(\sqrt{n h_n \log n}\,|R_n(2)|\rightarrow ^p 0\), where \(R_n(2)\) is as in (25). Next, observe that \(\sup _x \big |\widehat{f}_n(x)/\widehat{\sigma }_n^2(x)\big | \le \big [\sup _x \big |\widehat{f}_n(x)-f(x)\big |+\sup _x f(x)\big ]/\inf _x \widehat{\sigma }^2_{\widehat{p}_n}(x) = O_p(1)\), which follows because \(\sup _x|\widehat{f}_n(x)-f(x)|=o_p(1)\) and by the fact that \(0< \lim _{n\rightarrow \infty } \inf _x \, \widehat{\sigma }_n^2(x) <\infty \) (as shown above). Combining these results, we have
Putting the above results together, we have \(\sqrt{n h_n \log n}\,|R_n|=o_p(1)\). Theorem 2 now follows from this together with (24), (25), and (26). \(\square \)
Rights and permissions
About this article
Cite this article
Al-Sharadqah, A., Mojirsheibani, M. A simple approach to construct confidence bands for a regression function with incomplete data. AStA Adv Stat Anal 104, 81–99 (2020). https://doi.org/10.1007/s10182-019-00351-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-019-00351-7