Skip to main content
Log in

General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters

  • Published:
Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Abstract

In the present paper, we consider the problem of the estimation of a parameter \(\varvec{\theta }\), in Banach spaces, maximizing some criterion function which depends on an unknown nuisance parameter h, possibly infinite-dimensional. The classical estimation methods are mainly based on maximizing the corresponding empirical criterion by substituting the nuisance parameter by a nonparametric estimator. We show that the M-estimators converge weakly to maximizers of Gaussian processes under rather general conditions. The conventional bootstrap method fails in general to consistently estimate the limit law. We show that the m out of n bootstrap, in this extended setting, is weakly consistent under conditions similar to those required for weak convergence of the M-estimators. The aim of this paper is therefore to extend the existing theory on the bootstrap of the M-estimators. Examples of applications from the literature are given to illustrate the generality and the usefulness of our results. Finally, we investigate the performance of the methodology for small samples through a short simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Alin A, Martin MA, Beyaztas U, Pathak PK (2017) Sufficient m-out-of-n(m/n) bootstrap. J Stat Comput Simul 87(9):1742–1753

    Article  MathSciNet  MATH  Google Scholar 

  • Allaire G (2005) Analyse numérique et optimisation: une introduction à la modélisation mathématique et à la simulation numérique. Editions Ecole (Polytechnique)

  • Alvarez-Andrade S, Bouzebda S (2013) Strong approximations for weighted bootstrap of empirical and quantile processes with applications. Stat Methodol 11:36–52

    Article  MathSciNet  MATH  Google Scholar 

  • Alvarez-Andrade S, Bouzebda S (2015) On the local time of the weighted bootstrap and compound empirical processes. Stoch Anal Appl 33(4):609–629

    Article  MathSciNet  MATH  Google Scholar 

  • Alvarez-Andrade S, Bouzebda S (2019) Some selected topics for the bootstrap of the empirical and quantile processes. Theory Stoch Process 24(1):19–48

    MathSciNet  MATH  Google Scholar 

  • Arcones MA, Giné E (1992) On the bootstrap of M-estimators and other statistical functionals. In Exploring the limits of bootstrap (East Lansing, MI, 1990), Wiley Ser Probab Math Statist Probab. Math. Statist., pages 13–47. Wiley, New York

  • Bickel PJ, Sakov A (2008) On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statist Sinica 18(3):967–985

    MathSciNet  MATH  Google Scholar 

  • Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD

  • Bickel PJ, Götze F, van Zwet WR (1997) Resampling fewer than n observations: gains, losses, and remedies for losses. Statist Sinica 7(1), 1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995)

  • Bose A, Chatterjee S (2001) Generalised bootstrap in non-regular M-estimation problems. Statist Probab Lett 55(3):319–328

    Article  MathSciNet  MATH  Google Scholar 

  • Bouzebda S (2010) Bootstrap de l’estimateur de Hill: théorèmes limites. Ann ISUP 54(1-2), 61–72

  • Bouzebda S, Limnios N (2013) On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J Multivariate Anal 116:52–62

    Article  MathSciNet  MATH  Google Scholar 

  • Bouzebda S, Papamichail C, Limnios N (2018) On a multidimensional general bootstrap for empirical estimator of continuous-time semi-Markov kernels with applications. J Nonparametr Stat 30(1):49–86

    Article  MathSciNet  MATH  Google Scholar 

  • Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71(5):1591–1608

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng G, Huang JZ (2010) Bootstrap consistency for general semiparametric M-estimation. Ann Statist 38(5):2884–2915

    Article  MathSciNet  MATH  Google Scholar 

  • Chernick MR (2008) Bootstrap methods: a guide for practitioners and researchers. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition

  • Datta S, McCormick WP (1995) Bootstrap inference for a first-order autoregression with positive innovations. J Amer Statist Assoc 90(432):1289–1300

    Article  MathSciNet  MATH  Google Scholar 

  • Davison AC, Hinkley DV (1997) Bootstrap methods and their application, volume 1 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. With 1 IBM-PC floppy disk (3.5 inch; HD)

  • Delsol L, Van Keilegom I (2020) Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Statist Math 72(2):577–605

    Article  MathSciNet  MATH  Google Scholar 

  • Dudley RM (1999) Uniform central limit theorems, volume 63 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Statist 7(1):1–26

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York

  • El Bantli F (2004) M-estimation in linear models under nonstandard conditions. J Statist Plann Inference 121(2):231–248

    Article  MathSciNet  MATH  Google Scholar 

  • Giné E, Zinn J (1989) Necessary conditions for the bootstrap of the mean. Ann Statist 17(2):684–691

  • Götze F, Račkauskas A (2001) Adaptive choice of bootstrap sample sizes. In State of the art in probability and statistics (Leiden, 1999), volume 36 of IMS Lecture Notes Monogr. Ser., pages 286–309. Inst. Math. Statist., Beachwood, OH

  • Hall P (1992) The bootstrap and Edgeworth expansion. Springer Series in Statistics. Springer-Verlag, New York

  • Hall P, Horowitz JL, Jing B-Y (1995) On blocking rules for the bootstrap with dependent data. Biometrika 82(3):561–574

    Article  MathSciNet  MATH  Google Scholar 

  • Hoffmann-Jørgensen J (1991) Stochastic processes on Polish spaces. Various publications series. Aarhus Universitet, Matematisk Institut

    MATH  Google Scholar 

  • Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econometrics 58(1–2):71–120

    Article  MathSciNet  MATH  Google Scholar 

  • Kim J, Pollard D (1990) Cube root asymptotics. Ann Statist 18(1):191–219

    Article  MathSciNet  MATH  Google Scholar 

  • Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer Series in Statistics, Springer, New York

    Book  MATH  Google Scholar 

  • Koul HL, Müller UU, Schick A et al (2012) The transfer principle: a tool for complete case analysis. Ann Stat 40(6):3031–3049

    Article  MathSciNet  MATH  Google Scholar 

  • Kristensen D, Salanié B (2017) Higher-order properties of approximate estimators. J Econometrics 198(2):189–208

    Article  MathSciNet  MATH  Google Scholar 

  • Lahiri SN (1992) On bootstrapping M-estimators. Sankhyā Ser A 54(2):157–170

    MathSciNet  MATH  Google Scholar 

  • Lee SMS (2012) General M-estimation and its bootstrap. J Korean Statist Soc 41(4):471–490

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SMS, Pun MC (2006) On m out of n bootstrapping for nonstandard M-estimation with nuisance parameters. J Amer Statist Assoc 101(475):1185–1197

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SMS, Yang P (2020) Bootstrap confidence regions based on M-estimators under nonstandard conditions. Ann Statist 48(1):274–299

    Article  MathSciNet  MATH  Google Scholar 

  • Ma S, Kosorok MR (2005) Robust semiparametric m-estimation and the weighted bootstrap. J Multivar Anal 96(1):190–217

    Article  MathSciNet  MATH  Google Scholar 

  • Müller UU et al (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37(5A):2245–2277

    Article  MathSciNet  MATH  Google Scholar 

  • Pakes A, Olley S (1995) A limit theorem for a smooth class of semiparametric estimators. J. Econometrics 65(1):295–332

    Article  MathSciNet  MATH  Google Scholar 

  • Pakes A, Pollard D (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57(5):1027–1057

    Article  MathSciNet  MATH  Google Scholar 

  • Pérez-González A, Vilar-Fernández JM, González-Manteiga W (2009) Asymptotic properties of local polynomial regression with missing data and correlated errors. Ann Inst Stat Math 61(1):85–109

    Article  MathSciNet  MATH  Google Scholar 

  • Pfanzagl J (1990) Estimation in semiparametric models, vol 63. Lecture Notes in Statistics. Springer-Verlag, New York, Some recent developments

  • Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer Series in Statistics. Springer-Verlag, New York

  • Pollard D (1985) New ways to prove central limit theorems. Economet Theor 1(3):295–313

    Article  Google Scholar 

  • Shao J, Tu DS (1995) The jackknife and bootstrap. Springer Series in Statistics. Springer-Verlag, New York

  • Swanepoel JWH (1986) A note on proving that the (modified) bootstrap works. Comm Statist A-Theory Methods 15(11):3193–3203

    Article  MathSciNet  MATH  Google Scholar 

  • van de Geer SA (2000) Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge

  • van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics

  • Wei B, Lee SMS, Wu X (2016) Stochastically optimal bootstrap sample size for shrinkage-type statistics. Stat Comput 26(1–2):249–262

    Article  MathSciNet  MATH  Google Scholar 

  • Wellner JA, Zhan Y (1996) Bootstrapping Z-estimators. Preprint

  • Zhan Y (2002) Central limit theorems for functional Z-estimators. Statist. Sinica 12(2):609–634

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are indebted to the Editor-in-Chief, Associate Editor and the referee for their very valuable comments, suggestions careful reading of the article which led to a considerable improvement of the manuscript. The Third author gratefully acknowledges the funding received towards his PhD from the Algerian government PhD fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salim Bouzebda.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Applications

Appendix: Applications

We present in this section some examples which can not handled with the classical theory of semiparametric estimators and their m out of n bootstrap version cannot be applied while theory of the paper can be applied. This illustrate the usefulness of our results. Delsol and Van Keilegom (2020) provided some examples of situations in which the existing theory on semiparametric estimators cannot be applied, whereas their result could be applied. It worth noticing that the aim of this section is to verify the bootstrap conditions that are different from those used for the non bootstrapped estimators checked in the last mentioned reference. Although only three examples will be given here, they stand as archetypes for a variety of models that can be investigated by the methodology of the present paper.

1.1 Single Index Model with Monotone Link Function

The single index regression models are typical examples which are given

$$\begin{aligned} Y=g\left( \mathbf {X}^{\top } \varvec{\beta }\right) +\varepsilon \end{aligned}$$
(22)

where \(\mathbb {P}(\varepsilon | \mathbf{X})=0, {\text {Var}}(\varepsilon | \mathbf{X})<\infty\) and we assume that the unknown function \(g(\cdot )\) is monotone, we refer to Ichimura (1993) for more details. On the basis of the sample \(\left( \mathbf{X}_{1}, Y_{1}\right) , \ldots ,\left( \mathbf{X}_{n}, Y_{n}\right)\) coming from the model (22), we make use of the the pool-adjacent-violators algorithm to constrcut and estimator of the function \(g(\cdot )\). This gives a non-smooth estimator \(\widehat{g}_{\varvec{\beta }}(\cdot )\) of \(g_{\varvec{\beta }}(\mathbf {z})=\mathbb E\left[ Y |\mathbf{X}^{\top } \varvec{\beta }=\mathbf {z}\right] .\) Next, by using the least-squares estimation method we estimate \(\varvec{\beta }\)

$$\widehat{\varvec{\beta }}=\arg \max _{\varvec{\beta }}\left[ -n^{-1} \sum _{i=1}^{n}\left( Y_{i}-\widehat{g}_{\varvec{\beta }}\left( \mathbf{X}_{i}^{T} \varvec{\beta }\right) \right) ^{2}\right] .$$

By the fact that \(\widehat{g}_{\varvec{\beta }}(\cdot )\) is of non-smooth nature implies that the criterion function is not smooth in \(\varvec{\beta }\). This is a situation where the theory of the present paper can be applied.

1.2 Classification with Missing Data

Let \(\mathbf {X}_{1}=(\mathbf {X}_{11},\mathbf {X}_{12}),\ldots ,\mathbf {X}_{n}=(\mathbf {X}_{n1},\mathbf {X}_{n2})\) be independent and identically distributed random copies of the random vector \(\mathbf {X}=(\mathbf {X}_{1},\mathbf {X}_{2})\), coming from two underlying populations. For \(j=0,1\), let \(\mathbf {Y}_{i}=j\) when the \(\mathbf{X}_i\) comes from the population j. Let us denote by \(\mathbf {Y}\) the population indicator associated with the vector \(\mathbf {X}\). Using the information of available data, we seek to find a classification method for novel observations with unknown true population.

The classification is performed by regressing \(\mathbf {X}_{2}\) on \(\mathbf {X}_{1}\) making use of the parametric criterion function \(f_{\varvec{\theta }}(\cdot )\), and choosing \(\varvec{\theta }\) that maximize the following

$$\begin{aligned} \mathbb {P}\mathrm{1\!I}_{ \{\mathbf {Y}=1, \mathbf {X}_{2} \ge f_{\varvec{\theta }}(\mathbf {X}_{1})\}}+\mathbb {P}\mathrm{1\!I}_{ \{\mathbf {Y}=0, \mathbf {X}_{2} < f_{\varvec{\theta }}(\mathbf {X}_{1})\}}. \end{aligned}$$
(23)

Let \(\varvec{\theta }_{0}\) denote the maximizer of (23) with respect to all \(\varvec{\theta } \in \varvec{\Theta }\), here \(\varvec{\Theta }\) is assumed to be a compact subset of \(\mathbb {R}^{k}\) containing as an interior point \(\varvec{\theta }_{0}\). Now assume that \(\mathbf {Y}_{i}\)’s are subject to some missing mechanism. Let \(\Delta _{i}\)be a random variable (respectively \(\Delta\)) equals to 1 when we observe the random variable \(\mathbf {Y}_{i}\) (respectively \(\mathbf {Y}\)), and 0 otherwise. Let \(\mathbf {Z}_{1}=(\mathbf {X}_{1},\mathbf {Y}_{1}\Delta _{1},\Delta _{1}),\ldots , \mathbf {Z}_{n}=(\mathbf {X}_{n},\mathbf {Y}_{n}\Delta _{n},\Delta _{n})\) be the observations at hand. The missing at random mechanism in considered in the following sense

$$\begin{aligned} \mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}}|\mathbf {X}_{1},\mathbf {X}_{2},\mathbf {Y}\right) =\mathbb {P}\left( \mathrm{1\!I}_{\{ \Delta =1\}}|\mathbf {X}_{1}\right) :=p^{0}\left( \mathbf {X}_{1}\right) . \end{aligned}$$

Note that the relation (23) can be written

$$\mathbb {E}\left[ \frac{\mathrm{1\!I}_{ \{\Delta =1\}}}{p^{0}(\mathbf {X}_{1})}\left\{ \mathrm{1\!I}_{ \{\mathbf {Y}=1, \mathbf {X}_{2} \ge f_{\varvec{\theta }}(\mathbf {X}_{1})\}}+\mathrm{1\!I}_{ \{\mathbf {Y}=0, \mathbf {X}_{2} < f_{\varvec{\theta }}(\mathbf {X}_{1})\}}\right\} \right] .$$

We define

$$\mathbf {m}_{{\varvec{\theta }},p}(\mathbf {Z})=\frac{\mathrm{1\!I}_{ \{\Delta =1\}}}{p(\mathbf {X}_{1})}\left\{ \mathrm{1\!I}_{ \{\mathbf {Y}=1, \mathbf {X}_{2} \ge f_{\varvec{\theta }}(\mathbf {X}_{1})\}}+\mathrm{1\!I}_{ \{\mathbf {Y}=0, \mathbf {X}_{2} < f_{\varvec{\theta }}(\mathbf {X}_{1})\}}\right\} ,$$

here the infinite dimensional nuisance parameter \(p(\cdot )\) belonging to some functional space \(\mathcal {P}\) to be specified later. Consequently, the estimator \(\varvec{\theta }_{n}\) of \(\varvec{\theta }_{0}\) is given by

$$\varvec{\theta }_{n}=\arg \max \limits _{\varvec{\theta } \in \theta }\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{p}},$$

where, for any x and a bandwidth sequence \(h=h_{n}\),

$$\widehat{p}(x)=\sum \limits _{i=1}^{n}\frac{K_{h}(x-\mathbf {X}_{i1})}{\sum \limits _{j=1}^{n}K_{h}(x-\mathbf {X}_{j1})}\mathrm{1\!I}_{ \{\Delta _{i}=1\}},$$

where the kernel function \(K(\cdot )\) is assumed to be a density function having support \([-1,1]\), \(K_{h}(u)=\frac{K\left( \frac{u}{h}\right) }{h}\). Non parametric regression with missing have long attracted a great deal of attention, for good sources of references to research literature in this area along with statistical applications consult Müller (2009), Pérez-González et al. (2009) and Koul et al. (2012) among many others.

1.3 Binary Choice Model with Missing Data

Let us define the binary choice model, in the linear regression function framework, by

$$\left\{ \begin{array}{ll} U &{}=\mathbf{X}^{\top } \varvec{\beta }-\varepsilon , \\ Y &{}=\mathrm{1\!I}(U \ge 0), \end{array}\right.$$

where we assume that \(\varepsilon\) is zero median conditionally on \(\mathbf{X} .\) The random variable Y is missing at random with the probability, to observe Y, depending on \(\mathbf{X}\) via the following relation

$$\mathbb {P}(\mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf{X}, Y)=\mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf{X}^{\top } \gamma \right) :=p\left( \mathbf{X}^{\top } \gamma \right) ,$$

where \(\Delta =1\) when we observe Y and 0 elsewhere. The observed data for the preceding model are given by of i.i.d. triplets \(\left( \mathbf {X}_{1}, Y_{1} \Delta {1}, \Delta _{n}\right) ,\ldots ,\left( \mathbf {X}_{n}, Y_{n} \Delta _{n}, \Delta _{n}\right)\). To estimate \(p_{\gamma }(z)=\) \(\mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf {X}^{\top } \varvec{\gamma }=z\right) ,\) we use the following

$$\widehat{p}_{\varvec{\gamma }}(z)=\sum _{i=1}^{n} \frac{\displaystyle K_{h}\left( \mathbf {X}_{i}^{\top } \varvec{\gamma }-z\right) }{\displaystyle \sum _{j=1}^{n} K_{h}\left( \mathbf{X}_{j}^{\top } \varvec{\gamma }-z\right) }\mathrm{1\!I}_{ \{\Delta _i=1\}}.$$

The parameter estimate is given by

$$(\widehat{\varvec{\beta }}, \widehat{\varvec{\gamma }})=\arg \max \limits _{\varvec{\beta }, \varvec{\gamma }} \mathbb {P}_{n}\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, \widehat{p}_{\varvec{\gamma }}},$$

where

$$\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, p}= \frac{\mathrm{1\!I}_{ \{\Delta _{i}=1\}}}{p\left( \mathbf{X}_{i}^{\top } \varvec{\gamma }\right) }\left[ 2 \mathrm{1\!I}_{ \{Y_{i}=1\}}-1\right] \mathrm{1\!I}_{ \{\mathbf {X}_{i}^{\top } \varvec{\beta } \ge 0\}}.$$

The existing theory cannot be applied here by the fact that the function \(\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, p}\) is smooth in \(\varvec{\gamma }\) but non-smooth in \(\varvec{\beta }\).

Now we will study in full detail the example in Sect. 7.2 and we work out the verification of the conditions of Theorems 3.2, 3.5, 3.8 and 3.9 the most of this conditions verified in Sect. 7 of Delsol and Van Keilegom (2020) by noting that \(\nu =2\) and \(\ell \equiv 1\), so our focuses is to verify the conditions needed for the m out of n bootstrapped version. In the beginning we give some information about the nuisance function and her space and some notation. Let \(\mathcal {P}\) be the space of functions \(p:\mathbf {R}_{\mathbf {X}_{1}}\rightarrow \mathbb {R}\) that are continuously differentiable, for which

$$\sup \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}p(\mathbf {x}_{1}) \le M < \infty , \sup \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}|p^{'}(\mathbf {x}_{1})| \le M ~ \text{ and } ~\inf \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}p(\mathbf {x}_{1})>\eta /2,$$

where

$$\eta =\inf \limits _{\mathbf {x}_{1}\in \mathbf {R}_{X_{1}}}p^{0}(\mathbf {x}_{1})>0$$

and \(\mathbf {R}_{\mathbf {X}_{1}}\) is the support of \(\mathbf {X}_{1}\), where we suppose it is a compact subspace of \(\mathbb {R}\). We equip the space \(\mathcal {P}\) with the supremum norm:

$$d_{\mathcal {P}}(p_{1},p_{2})=\sup \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}|p_{1}(\mathbf {x}_{1})-p_{2}(\mathbf {x}_{1})|~ \text{ for } \text{ any } ~p_{1},p_{2} \in \mathcal {P}.$$

After, the conditions of the consistency are verified as follows, (A1) holds true provided the functions \(p_{0}(\cdot )\) and \(K(\cdot )\) are continuously differentiable. For assumption (A2) we can showing that the bracketing number of the class \(\mathcal {F}=\{\mathbf {m}_{\varvec{\theta },p}, \varvec{\theta }\in \Theta , p\in \mathcal {P}\}\); \(N_{[~]}\left( \epsilon , \mathcal {F}, \mathbb {L}_{\mathbb {P}}\right)\) is finite for all \(\epsilon >0\), by using Corollary 2.7.2 of van der Vaart and Wellner (1996), we get

$$\begin{aligned} N_{[~]}\left( \epsilon , \mathcal {P}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\}, \end{aligned}$$
(24)

and

$$N_{[~]}\left( \epsilon , \{f_{\varvec{\theta }}, \varvec{\theta }\in \varvec{\Theta }\}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\},$$

by the properties of the set \(\mathcal {P}\) and the fact that \(\mathbf {x}\mapsto f_{\varvec{\theta }}(\mathbf {x})\) is continuously differentiable over \(\varvec{\theta }\) with bounded derivative and as a consequence it’s easily to show that

$$\begin{aligned} N_{[~]}\left( \epsilon , \mathcal {T}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\}, \end{aligned}$$
(25)

for the class \(\mathcal {T}=\left\{ \left( \mathbf {x}_{1}, \mathbf {x}_{2}\right) \rightarrow \mathrm{1\!I}_{\{x_{2} \ge f_{\varvec{\theta }}\left( \mathbf {x}_{1}\right) \}}: \varvec{\theta } \in \varvec{\Theta }\right\}\). From (24) and (25) we get;

$$\begin{aligned} N_{[~]}\left( \epsilon , \mathcal {F}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\}. \end{aligned}$$

Then assumption (A3) is straightforward. Assumption (A4) is an identifiability condition to ensure the uniqueness of \(\varvec{\theta }_{0}\) and (A5) is verified by construction of the estimator \(\varvec{\theta }_{n}\). The consistency of \(\varvec{\theta }_{n}\) is then follows. For the conditions of the bootstrap version they are verified as follows; fist part of assumption (AB1) is satisfied by definition of the m out of n bootstrap, where the second part in this situation follows directly by noting that if \(r_{n}=n^{\kappa }\), we get \(r_{m}=m^{\kappa }\) for some \(\kappa >0\), by consequent we have \(r^{2}_{m}=o(r^{2}_{n})\). For (AB2) as mentioned in remark 3.1(v) we take \(\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )\) where we replace the variables \(\mathbf {X}_{1i}\) and \(\Delta _{i}\) by \(\mathbf {X}^{*}_{1i}\) and \(\Delta ^{*}_{i}\) respectively in \(\widehat{p}(\cdot )\); i.e.,

$$\widehat{p}_{m}(x_{1})=\sum \limits _{i=1}^{m}\frac{K_{h}(x_{1}-\mathbf {X}^{*}_{i1})}{\sum \limits _{j=1}^{m}K_{h}(x_{1}-\mathbf {X}^{*}_{j1})}\mathrm{1\!I}_{ \{\Delta ^{*}_{i}=1\}},$$

we remark that

$$\mathbb {P}_{W}\left( \frac{1}{m}\sum \limits _{i=1}^{m}K_{h}(x_{1}-\mathbf {X}^{*}_{i1})\mathrm{1\!I}_{ \{\Delta ^{*}_{i}=1\}}\right) =\frac{1}{n}\sum \limits _{i=1}^{n}K_{h}(x_{1}-\mathbf {X}_{i1})\mathrm{1\!I}_{ \{\Delta _{i}=1\}},$$

which implies \(d_{\mathcal {H}}\left( \widehat{p}_{m},\widehat{p}\right) =o_{\mathbb {P}^{*}_{W}}(1)\) i.p. By the triangular inequality we get

$$d_{\mathcal {H}}\left( \widehat{p}_{m},p^{0}\right) \le d_{\mathcal {H}}\left( \widehat{p}_{m},\widehat{p}\right) +d_{\mathcal {H}}\left( \widehat{p},p^{0}\right) =o_{\mathbb {P}^{*}_{W}}(1),~~ \text{ i.p. }$$

(AB3) is verified by construction of the estimator \(\varvec{\theta }_{m}\). Which implies the consistency of \(\varvec{\theta }_{m}\). Next for the rate of convergence we show only conditions (B2) and (B3). For (B2), it suffices by remark 3.3(ii) to show (4) and (5). For that by uses of the relation between covering and bracketing numbers and Corollary 2.7.2 of van der Vaart and Wellner (1996) we get that

$$\log N\left( \epsilon \left\| M_{\delta , \delta _{1}^{\prime }}\right\| _{\mathbb {L}_{2}\left( \mathbb {Q}\right) }, \mathcal {M}_{\delta , \delta _{1}^{\prime }}, \mathbb {L}_{2}(\mathbb {Q})\right) \le \exp \{\mathfrak K\epsilon ^{-1}\},$$

for every probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{4}\), which implies our relation in (4), (5) is verified by the choice \(\varphi (\delta )=\sqrt{\delta }\) as consequence we get (B2). For (B3), it follows directly like in Sect. 7 of the same reference which described this example and by the choice of the two functions \(\varvec{\psi }_{1}(\cdot )\) and \(\varvec{\psi }_{2}(\cdot )\) given in Remark 3.3(iii), which implies (B3). By their discussion for the rates \(r_{n}\), \(v_{n}\) and the bandwidth h of the kernel; it follows

$$\varvec{\theta }_{n}-\varvec{\theta }_{0}=O_{\mathbb {P}^{*}}\left( n^{-1/3}\right) .$$

We verify the assumption (BB1) as in the verification of condition (AB2) by choosing \(\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )\) we get \(v^{-1}_{m}=\sqrt{\frac{\log m}{mh}}+h\), where \(h=h_{m}.\) Assumption (BB2) holds by the same argument given for (B2). For assumption (BB3), we check conditions (b)-(d) of Remark 3.3(iii). We obtain

$$\begin{aligned} \Gamma \left( \varvec{\theta }_{0}, p\right) =\mathbb {P}\left[ \frac{p_{0}\left( \mathbf {X}_{1}\right) }{p\left( \mathbf {X}_{1}\right) }\left\{ 1-2 \mathbb {P}\left( \mathrm{1\!I}_{ \{\mathbf {Y}=1\}} | \mathbf {X}_{1}, \mathbf {X}_{2}\right) \right\} f_{\mathbf {X}_{2} | \mathbf {X}_{1}}\left( f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) \frac{\partial }{\partial \theta } f_{\theta _{0}}\left( \mathbf {X}_{1}\right) \right] , \end{aligned}$$
(26)

and

$$\begin{aligned} \begin{aligned} \Lambda \left( \varvec{\theta }_{0}, p\right) =&\mathbb {P}\left[ \frac{p_{0}\left( \mathbf {X}_{1}\right) }{p\left( \mathbf {X}_{1}\right) }\left\{ 1-2 \mathbb {P}\left( \mathrm{1\!I}_{ \{\mathbf {Y}=1\}} | \mathbf {X}_{1}, \mathbf {X}_{2}\right) \right\} \left\{ f_{\mathbf {X}_{2} | \mathbf {X}_{1}}^{\prime }\left( f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) \left( \frac{\partial }{\partial \varvec{\theta }} f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) ^{2}\right. \right. \\&\left. \left. +f_{\mathbf {X}_{2} | \mathbf {X}_{1}}\left( f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) \frac{\partial ^{2}}{\partial \varvec{\theta }^{2}} f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right\} \right] , \end{aligned} \end{aligned}$$

provided the derivatives in \(\Lambda \left( \varvec{\theta }_{0}, p\right)\) all exist. By the definition of maximum it follows that \(\Gamma \left( \varvec{\theta }_{0}, p^{0}\right) =0\) and \(\Lambda \left( \varvec{\theta }_{0}, p^{0}\right)\) is negative. Noting that

$$\Vert \Gamma \left( \varvec{\theta }_{0}, \widehat{p}_{m}\right) \Vert =O_{\mathbb {P}_{W}}(r^{-1}_{m})~\text {i.p.}$$

if \(r_{m}\) satisfies

$$r_{m}\left( m^{-1/2}+h_{m}+\frac{\log m}{mh_{m}}\right) =O(1),$$

by noting that the expectation in (26) is taken with respect to \(\mathbf {Z}\) and \(\mathbf {W}\) when we are working with \(\widehat{p}_{m}\), since our function are measurable, we obtain such result by applying Fubini’s Theorem. This condition on \(r_{m}\) and the other given in (BB2) which is satisfied for \(r_{m}=O(m^{1/3})\) are reconcilable provided

$$mh_{m}^{3}=O(1)~~ \text{ and } ~~\frac{(\log m)^{3/2}}{mh_{m}^{3/2}}=O(1).$$

Note that if we assume that \(p^{0}(\cdot )\) is twice continuously differentiable we can weaken the first condition to \(mh_{m}^{6}=O(1)\), as a consequence we get the \(v^{-1}_{m}\) of \(\widehat{p}_{m}\) would be \(O\left( \sqrt{\frac{\log m}{mh_{m}}}+h^{2}_{m}\right)\), which is faster than \(r^{-1}_{m}=m^{-1/3}\) of \(\varvec{\theta }_{m}\) provided \(mh_{m}^{3} \longrightarrow \infty\). The level of complexity of the latter case is less than the case where \(p^{0}\) is only once differentiable, And we do not discuss it any further, therefore. We conclude that,

$$\varvec{\theta }_{m}-\varvec{\theta }_{0}=O_{\mathbb {P}^{*}_{W}}(m^{-1/3})~\text {i.p.}$$

Finally, for the weak convergence of \(\varvec{\theta }_{n}\), we note that our assumptions (C4) is satisfied for \(j_{n}=\sqrt{n}\) like in Remark3.5 (iii) and (C9) hold similarly to (B2). By consequence \(n^{1/3}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\) converge weakly, where assumption (CB1) follows from part (ii) of Theorem 3.5 and condition (BB1), by similar proof of condition (BB2) we get (CB2). We get from Remark 3.3 (iii), (vi) and Remark 3.5 (viii) that assumption (CB3) holds, provided that

$$|\Lambda (\varvec{\theta }_{0},p^{0})|<\infty .$$

Clearly we have for some positive constant \(c>0\) that \(m^{-1/3} < C\). For assumption (CB4), we have

$$r_{m}W_{m}(\gamma )=r_{m}\Gamma (\varvec{\theta }_{0},\widehat{p}_{m})\gamma =o_{\mathbb {P}_{W}}(1)~\text {i.p.},$$

provided \(mh^{3}_{m}=o(1)\) and \(\frac{\log ^{3/2}m}{mh^{3/2}_{m}}=o(m^{-1/2})\), by using what we discuss already for (BB3). Next, by the result given to the process in (16) i.e., the process \(\gamma \mapsto \mathbb {G}_{n}\frac{r^{2}_{m}}{\sqrt{m}}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\) converges weakly to the process \(\mathbb {G}(\gamma )\) and condition (AB1), we get

$$\begin{aligned} r^{2}_{m}\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}&=r^{2}_{m}\left[ \left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}+\sqrt{\frac{m}{n}}\mathbb {G}_{n}\frac{\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}}{\sqrt{m}}+\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}\right] \\&=r^{2}_{m}\left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}+\frac{1}{2}\Lambda (\varvec{\theta }_{0},p^{0})\gamma ^{2}+o_{\mathbb {P}}(1), \end{aligned}$$

with \(\Gamma (\varvec{\theta }_{0},p^{0})=0\) and

$$\Lambda (\gamma )=\frac{1}{2}\Lambda (\varvec{\theta }_{0},p^{0})\gamma ^{2}.$$

The process \(\gamma \rightarrow r^{2}_{m}\left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}\) are the same given in Lee (2012) where there is no presence of nuisance parameter. Hence, we can follow the same steps given in Lemma 1 of Lee (2012) and get the convergence of the marginals using Lindeberg’s condition and some regularity assumption on \(f_{\mathbf {X}_{1}/\mathbf {X}_{2}}\) and \(\varvec{\theta }\mapsto f_{\varvec{\theta }}\). By construction of the estimator \(\varvec{\theta }_{m}\), condition (CB5) follows. Then we get the asymptotic distribution of \(r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{n})\) from part (ii) of Theorem 3.9.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouzebda, S., Elhattab, I. & Ferfache, A.A. General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters. Methodol Comput Appl Probab 24, 2961–3005 (2022). https://doi.org/10.1007/s11009-022-09965-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11009-022-09965-y

Keywords

AMS Subject Classification

Navigation