Abstract
In the present paper, we consider the problem of the estimation of a parameter \(\varvec{\theta }\), in Banach spaces, maximizing some criterion function which depends on an unknown nuisance parameter h, possibly infinite-dimensional. The classical estimation methods are mainly based on maximizing the corresponding empirical criterion by substituting the nuisance parameter by a nonparametric estimator. We show that the M-estimators converge weakly to maximizers of Gaussian processes under rather general conditions. The conventional bootstrap method fails in general to consistently estimate the limit law. We show that the m out of n bootstrap, in this extended setting, is weakly consistent under conditions similar to those required for weak convergence of the M-estimators. The aim of this paper is therefore to extend the existing theory on the bootstrap of the M-estimators. Examples of applications from the literature are given to illustrate the generality and the usefulness of our results. Finally, we investigate the performance of the methodology for small samples through a short simulation study.
Similar content being viewed by others
References
Alin A, Martin MA, Beyaztas U, Pathak PK (2017) Sufficient m-out-of-n(m/n) bootstrap. J Stat Comput Simul 87(9):1742–1753
Allaire G (2005) Analyse numérique et optimisation: une introduction à la modélisation mathématique et à la simulation numérique. Editions Ecole (Polytechnique)
Alvarez-Andrade S, Bouzebda S (2013) Strong approximations for weighted bootstrap of empirical and quantile processes with applications. Stat Methodol 11:36–52
Alvarez-Andrade S, Bouzebda S (2015) On the local time of the weighted bootstrap and compound empirical processes. Stoch Anal Appl 33(4):609–629
Alvarez-Andrade S, Bouzebda S (2019) Some selected topics for the bootstrap of the empirical and quantile processes. Theory Stoch Process 24(1):19–48
Arcones MA, Giné E (1992) On the bootstrap of M-estimators and other statistical functionals. In Exploring the limits of bootstrap (East Lansing, MI, 1990), Wiley Ser Probab Math Statist Probab. Math. Statist., pages 13–47. Wiley, New York
Bickel PJ, Sakov A (2008) On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statist Sinica 18(3):967–985
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD
Bickel PJ, Götze F, van Zwet WR (1997) Resampling fewer than n observations: gains, losses, and remedies for losses. Statist Sinica 7(1), 1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995)
Bose A, Chatterjee S (2001) Generalised bootstrap in non-regular M-estimation problems. Statist Probab Lett 55(3):319–328
Bouzebda S (2010) Bootstrap de l’estimateur de Hill: théorèmes limites. Ann ISUP 54(1-2), 61–72
Bouzebda S, Limnios N (2013) On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J Multivariate Anal 116:52–62
Bouzebda S, Papamichail C, Limnios N (2018) On a multidimensional general bootstrap for empirical estimator of continuous-time semi-Markov kernels with applications. J Nonparametr Stat 30(1):49–86
Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71(5):1591–1608
Cheng G, Huang JZ (2010) Bootstrap consistency for general semiparametric M-estimation. Ann Statist 38(5):2884–2915
Chernick MR (2008) Bootstrap methods: a guide for practitioners and researchers. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition
Datta S, McCormick WP (1995) Bootstrap inference for a first-order autoregression with positive innovations. J Amer Statist Assoc 90(432):1289–1300
Davison AC, Hinkley DV (1997) Bootstrap methods and their application, volume 1 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. With 1 IBM-PC floppy disk (3.5 inch; HD)
Delsol L, Van Keilegom I (2020) Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Statist Math 72(2):577–605
Dudley RM (1999) Uniform central limit theorems, volume 63 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Statist 7(1):1–26
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York
El Bantli F (2004) M-estimation in linear models under nonstandard conditions. J Statist Plann Inference 121(2):231–248
Giné E, Zinn J (1989) Necessary conditions for the bootstrap of the mean. Ann Statist 17(2):684–691
Götze F, Račkauskas A (2001) Adaptive choice of bootstrap sample sizes. In State of the art in probability and statistics (Leiden, 1999), volume 36 of IMS Lecture Notes Monogr. Ser., pages 286–309. Inst. Math. Statist., Beachwood, OH
Hall P (1992) The bootstrap and Edgeworth expansion. Springer Series in Statistics. Springer-Verlag, New York
Hall P, Horowitz JL, Jing B-Y (1995) On blocking rules for the bootstrap with dependent data. Biometrika 82(3):561–574
Hoffmann-Jørgensen J (1991) Stochastic processes on Polish spaces. Various publications series. Aarhus Universitet, Matematisk Institut
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econometrics 58(1–2):71–120
Kim J, Pollard D (1990) Cube root asymptotics. Ann Statist 18(1):191–219
Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer Series in Statistics, Springer, New York
Koul HL, Müller UU, Schick A et al (2012) The transfer principle: a tool for complete case analysis. Ann Stat 40(6):3031–3049
Kristensen D, Salanié B (2017) Higher-order properties of approximate estimators. J Econometrics 198(2):189–208
Lahiri SN (1992) On bootstrapping M-estimators. Sankhyā Ser A 54(2):157–170
Lee SMS (2012) General M-estimation and its bootstrap. J Korean Statist Soc 41(4):471–490
Lee SMS, Pun MC (2006) On m out of n bootstrapping for nonstandard M-estimation with nuisance parameters. J Amer Statist Assoc 101(475):1185–1197
Lee SMS, Yang P (2020) Bootstrap confidence regions based on M-estimators under nonstandard conditions. Ann Statist 48(1):274–299
Ma S, Kosorok MR (2005) Robust semiparametric m-estimation and the weighted bootstrap. J Multivar Anal 96(1):190–217
Müller UU et al (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37(5A):2245–2277
Pakes A, Olley S (1995) A limit theorem for a smooth class of semiparametric estimators. J. Econometrics 65(1):295–332
Pakes A, Pollard D (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57(5):1027–1057
Pérez-González A, Vilar-Fernández JM, González-Manteiga W (2009) Asymptotic properties of local polynomial regression with missing data and correlated errors. Ann Inst Stat Math 61(1):85–109
Pfanzagl J (1990) Estimation in semiparametric models, vol 63. Lecture Notes in Statistics. Springer-Verlag, New York, Some recent developments
Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer Series in Statistics. Springer-Verlag, New York
Pollard D (1985) New ways to prove central limit theorems. Economet Theor 1(3):295–313
Shao J, Tu DS (1995) The jackknife and bootstrap. Springer Series in Statistics. Springer-Verlag, New York
Swanepoel JWH (1986) A note on proving that the (modified) bootstrap works. Comm Statist A-Theory Methods 15(11):3193–3203
van de Geer SA (2000) Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics
Wei B, Lee SMS, Wu X (2016) Stochastically optimal bootstrap sample size for shrinkage-type statistics. Stat Comput 26(1–2):249–262
Wellner JA, Zhan Y (1996) Bootstrapping Z-estimators. Preprint
Zhan Y (2002) Central limit theorems for functional Z-estimators. Statist. Sinica 12(2):609–634
Acknowledgements
The authors are indebted to the Editor-in-Chief, Associate Editor and the referee for their very valuable comments, suggestions careful reading of the article which led to a considerable improvement of the manuscript. The Third author gratefully acknowledges the funding received towards his PhD from the Algerian government PhD fellowship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Applications
Appendix: Applications
We present in this section some examples which can not handled with the classical theory of semiparametric estimators and their m out of n bootstrap version cannot be applied while theory of the paper can be applied. This illustrate the usefulness of our results. Delsol and Van Keilegom (2020) provided some examples of situations in which the existing theory on semiparametric estimators cannot be applied, whereas their result could be applied. It worth noticing that the aim of this section is to verify the bootstrap conditions that are different from those used for the non bootstrapped estimators checked in the last mentioned reference. Although only three examples will be given here, they stand as archetypes for a variety of models that can be investigated by the methodology of the present paper.
1.1 Single Index Model with Monotone Link Function
The single index regression models are typical examples which are given
where \(\mathbb {P}(\varepsilon | \mathbf{X})=0, {\text {Var}}(\varepsilon | \mathbf{X})<\infty\) and we assume that the unknown function \(g(\cdot )\) is monotone, we refer to Ichimura (1993) for more details. On the basis of the sample \(\left( \mathbf{X}_{1}, Y_{1}\right) , \ldots ,\left( \mathbf{X}_{n}, Y_{n}\right)\) coming from the model (22), we make use of the the pool-adjacent-violators algorithm to constrcut and estimator of the function \(g(\cdot )\). This gives a non-smooth estimator \(\widehat{g}_{\varvec{\beta }}(\cdot )\) of \(g_{\varvec{\beta }}(\mathbf {z})=\mathbb E\left[ Y |\mathbf{X}^{\top } \varvec{\beta }=\mathbf {z}\right] .\) Next, by using the least-squares estimation method we estimate \(\varvec{\beta }\)
By the fact that \(\widehat{g}_{\varvec{\beta }}(\cdot )\) is of non-smooth nature implies that the criterion function is not smooth in \(\varvec{\beta }\). This is a situation where the theory of the present paper can be applied.
1.2 Classification with Missing Data
Let \(\mathbf {X}_{1}=(\mathbf {X}_{11},\mathbf {X}_{12}),\ldots ,\mathbf {X}_{n}=(\mathbf {X}_{n1},\mathbf {X}_{n2})\) be independent and identically distributed random copies of the random vector \(\mathbf {X}=(\mathbf {X}_{1},\mathbf {X}_{2})\), coming from two underlying populations. For \(j=0,1\), let \(\mathbf {Y}_{i}=j\) when the \(\mathbf{X}_i\) comes from the population j. Let us denote by \(\mathbf {Y}\) the population indicator associated with the vector \(\mathbf {X}\). Using the information of available data, we seek to find a classification method for novel observations with unknown true population.
The classification is performed by regressing \(\mathbf {X}_{2}\) on \(\mathbf {X}_{1}\) making use of the parametric criterion function \(f_{\varvec{\theta }}(\cdot )\), and choosing \(\varvec{\theta }\) that maximize the following
Let \(\varvec{\theta }_{0}\) denote the maximizer of (23) with respect to all \(\varvec{\theta } \in \varvec{\Theta }\), here \(\varvec{\Theta }\) is assumed to be a compact subset of \(\mathbb {R}^{k}\) containing as an interior point \(\varvec{\theta }_{0}\). Now assume that \(\mathbf {Y}_{i}\)’s are subject to some missing mechanism. Let \(\Delta _{i}\)be a random variable (respectively \(\Delta\)) equals to 1 when we observe the random variable \(\mathbf {Y}_{i}\) (respectively \(\mathbf {Y}\)), and 0 otherwise. Let \(\mathbf {Z}_{1}=(\mathbf {X}_{1},\mathbf {Y}_{1}\Delta _{1},\Delta _{1}),\ldots , \mathbf {Z}_{n}=(\mathbf {X}_{n},\mathbf {Y}_{n}\Delta _{n},\Delta _{n})\) be the observations at hand. The missing at random mechanism in considered in the following sense
Note that the relation (23) can be written
We define
here the infinite dimensional nuisance parameter \(p(\cdot )\) belonging to some functional space \(\mathcal {P}\) to be specified later. Consequently, the estimator \(\varvec{\theta }_{n}\) of \(\varvec{\theta }_{0}\) is given by
where, for any x and a bandwidth sequence \(h=h_{n}\),
where the kernel function \(K(\cdot )\) is assumed to be a density function having support \([-1,1]\), \(K_{h}(u)=\frac{K\left( \frac{u}{h}\right) }{h}\). Non parametric regression with missing have long attracted a great deal of attention, for good sources of references to research literature in this area along with statistical applications consult Müller (2009), Pérez-González et al. (2009) and Koul et al. (2012) among many others.
1.3 Binary Choice Model with Missing Data
Let us define the binary choice model, in the linear regression function framework, by
where we assume that \(\varepsilon\) is zero median conditionally on \(\mathbf{X} .\) The random variable Y is missing at random with the probability, to observe Y, depending on \(\mathbf{X}\) via the following relation
where \(\Delta =1\) when we observe Y and 0 elsewhere. The observed data for the preceding model are given by of i.i.d. triplets \(\left( \mathbf {X}_{1}, Y_{1} \Delta {1}, \Delta _{n}\right) ,\ldots ,\left( \mathbf {X}_{n}, Y_{n} \Delta _{n}, \Delta _{n}\right)\). To estimate \(p_{\gamma }(z)=\) \(\mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf {X}^{\top } \varvec{\gamma }=z\right) ,\) we use the following
The parameter estimate is given by
where
The existing theory cannot be applied here by the fact that the function \(\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, p}\) is smooth in \(\varvec{\gamma }\) but non-smooth in \(\varvec{\beta }\).
Now we will study in full detail the example in Sect. 7.2 and we work out the verification of the conditions of Theorems 3.2, 3.5, 3.8 and 3.9 the most of this conditions verified in Sect. 7 of Delsol and Van Keilegom (2020) by noting that \(\nu =2\) and \(\ell \equiv 1\), so our focuses is to verify the conditions needed for the m out of n bootstrapped version. In the beginning we give some information about the nuisance function and her space and some notation. Let \(\mathcal {P}\) be the space of functions \(p:\mathbf {R}_{\mathbf {X}_{1}}\rightarrow \mathbb {R}\) that are continuously differentiable, for which
where
and \(\mathbf {R}_{\mathbf {X}_{1}}\) is the support of \(\mathbf {X}_{1}\), where we suppose it is a compact subspace of \(\mathbb {R}\). We equip the space \(\mathcal {P}\) with the supremum norm:
After, the conditions of the consistency are verified as follows, (A1) holds true provided the functions \(p_{0}(\cdot )\) and \(K(\cdot )\) are continuously differentiable. For assumption (A2) we can showing that the bracketing number of the class \(\mathcal {F}=\{\mathbf {m}_{\varvec{\theta },p}, \varvec{\theta }\in \Theta , p\in \mathcal {P}\}\); \(N_{[~]}\left( \epsilon , \mathcal {F}, \mathbb {L}_{\mathbb {P}}\right)\) is finite for all \(\epsilon >0\), by using Corollary 2.7.2 of van der Vaart and Wellner (1996), we get
and
by the properties of the set \(\mathcal {P}\) and the fact that \(\mathbf {x}\mapsto f_{\varvec{\theta }}(\mathbf {x})\) is continuously differentiable over \(\varvec{\theta }\) with bounded derivative and as a consequence it’s easily to show that
for the class \(\mathcal {T}=\left\{ \left( \mathbf {x}_{1}, \mathbf {x}_{2}\right) \rightarrow \mathrm{1\!I}_{\{x_{2} \ge f_{\varvec{\theta }}\left( \mathbf {x}_{1}\right) \}}: \varvec{\theta } \in \varvec{\Theta }\right\}\). From (24) and (25) we get;
Then assumption (A3) is straightforward. Assumption (A4) is an identifiability condition to ensure the uniqueness of \(\varvec{\theta }_{0}\) and (A5) is verified by construction of the estimator \(\varvec{\theta }_{n}\). The consistency of \(\varvec{\theta }_{n}\) is then follows. For the conditions of the bootstrap version they are verified as follows; fist part of assumption (AB1) is satisfied by definition of the m out of n bootstrap, where the second part in this situation follows directly by noting that if \(r_{n}=n^{\kappa }\), we get \(r_{m}=m^{\kappa }\) for some \(\kappa >0\), by consequent we have \(r^{2}_{m}=o(r^{2}_{n})\). For (AB2) as mentioned in remark 3.1(v) we take \(\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )\) where we replace the variables \(\mathbf {X}_{1i}\) and \(\Delta _{i}\) by \(\mathbf {X}^{*}_{1i}\) and \(\Delta ^{*}_{i}\) respectively in \(\widehat{p}(\cdot )\); i.e.,
we remark that
which implies \(d_{\mathcal {H}}\left( \widehat{p}_{m},\widehat{p}\right) =o_{\mathbb {P}^{*}_{W}}(1)\) i.p. By the triangular inequality we get
(AB3) is verified by construction of the estimator \(\varvec{\theta }_{m}\). Which implies the consistency of \(\varvec{\theta }_{m}\). Next for the rate of convergence we show only conditions (B2) and (B3). For (B2), it suffices by remark 3.3(ii) to show (4) and (5). For that by uses of the relation between covering and bracketing numbers and Corollary 2.7.2 of van der Vaart and Wellner (1996) we get that
for every probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{4}\), which implies our relation in (4), (5) is verified by the choice \(\varphi (\delta )=\sqrt{\delta }\) as consequence we get (B2). For (B3), it follows directly like in Sect. 7 of the same reference which described this example and by the choice of the two functions \(\varvec{\psi }_{1}(\cdot )\) and \(\varvec{\psi }_{2}(\cdot )\) given in Remark 3.3(iii), which implies (B3). By their discussion for the rates \(r_{n}\), \(v_{n}\) and the bandwidth h of the kernel; it follows
We verify the assumption (BB1) as in the verification of condition (AB2) by choosing \(\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )\) we get \(v^{-1}_{m}=\sqrt{\frac{\log m}{mh}}+h\), where \(h=h_{m}.\) Assumption (BB2) holds by the same argument given for (B2). For assumption (BB3), we check conditions (b)-(d) of Remark 3.3(iii). We obtain
and
provided the derivatives in \(\Lambda \left( \varvec{\theta }_{0}, p\right)\) all exist. By the definition of maximum it follows that \(\Gamma \left( \varvec{\theta }_{0}, p^{0}\right) =0\) and \(\Lambda \left( \varvec{\theta }_{0}, p^{0}\right)\) is negative. Noting that
if \(r_{m}\) satisfies
by noting that the expectation in (26) is taken with respect to \(\mathbf {Z}\) and \(\mathbf {W}\) when we are working with \(\widehat{p}_{m}\), since our function are measurable, we obtain such result by applying Fubini’s Theorem. This condition on \(r_{m}\) and the other given in (BB2) which is satisfied for \(r_{m}=O(m^{1/3})\) are reconcilable provided
Note that if we assume that \(p^{0}(\cdot )\) is twice continuously differentiable we can weaken the first condition to \(mh_{m}^{6}=O(1)\), as a consequence we get the \(v^{-1}_{m}\) of \(\widehat{p}_{m}\) would be \(O\left( \sqrt{\frac{\log m}{mh_{m}}}+h^{2}_{m}\right)\), which is faster than \(r^{-1}_{m}=m^{-1/3}\) of \(\varvec{\theta }_{m}\) provided \(mh_{m}^{3} \longrightarrow \infty\). The level of complexity of the latter case is less than the case where \(p^{0}\) is only once differentiable, And we do not discuss it any further, therefore. We conclude that,
Finally, for the weak convergence of \(\varvec{\theta }_{n}\), we note that our assumptions (C4) is satisfied for \(j_{n}=\sqrt{n}\) like in Remark3.5 (iii) and (C9) hold similarly to (B2). By consequence \(n^{1/3}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\) converge weakly, where assumption (CB1) follows from part (ii) of Theorem 3.5 and condition (BB1), by similar proof of condition (BB2) we get (CB2). We get from Remark 3.3 (iii), (vi) and Remark 3.5 (viii) that assumption (CB3) holds, provided that
Clearly we have for some positive constant \(c>0\) that \(m^{-1/3} < C\). For assumption (CB4), we have
provided \(mh^{3}_{m}=o(1)\) and \(\frac{\log ^{3/2}m}{mh^{3/2}_{m}}=o(m^{-1/2})\), by using what we discuss already for (BB3). Next, by the result given to the process in (16) i.e., the process \(\gamma \mapsto \mathbb {G}_{n}\frac{r^{2}_{m}}{\sqrt{m}}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\) converges weakly to the process \(\mathbb {G}(\gamma )\) and condition (AB1), we get
with \(\Gamma (\varvec{\theta }_{0},p^{0})=0\) and
The process \(\gamma \rightarrow r^{2}_{m}\left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}\) are the same given in Lee (2012) where there is no presence of nuisance parameter. Hence, we can follow the same steps given in Lemma 1 of Lee (2012) and get the convergence of the marginals using Lindeberg’s condition and some regularity assumption on \(f_{\mathbf {X}_{1}/\mathbf {X}_{2}}\) and \(\varvec{\theta }\mapsto f_{\varvec{\theta }}\). By construction of the estimator \(\varvec{\theta }_{m}\), condition (CB5) follows. Then we get the asymptotic distribution of \(r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{n})\) from part (ii) of Theorem 3.9.
Rights and permissions
About this article
Cite this article
Bouzebda, S., Elhattab, I. & Ferfache, A.A. General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters. Methodol Comput Appl Probab 24, 2961–3005 (2022). https://doi.org/10.1007/s11009-022-09965-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-022-09965-y
Keywords
- Gaussian process
- M-estimation
- Empirical process
- m out n of bootstrap
- Asymptotic distribution
- Nuisance parameter
- Semiparametric estimation
- Non standard distribution
- Missing data