General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters

Bouzebda, Salim; Elhattab, Issam; Ferfache, Anouar Abdeldjaoued

doi:10.1007/s11009-022-09965-y

General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters

Published: 16 June 2022

Volume 24, pages 2961–3005, (2022)
Cite this article

Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Salim Bouzebda ORCID: orcid.org/0000-0001-7801-4945¹,
Issam Elhattab^1,2 &
Anouar Abdeldjaoued Ferfache¹

281 Accesses
6 Citations
Explore all metrics

Abstract

In the present paper, we consider the problem of the estimation of a parameter $\varvec{\theta }$, in Banach spaces, maximizing some criterion function which depends on an unknown nuisance parameter h, possibly infinite-dimensional. The classical estimation methods are mainly based on maximizing the corresponding empirical criterion by substituting the nuisance parameter by a nonparametric estimator. We show that the M-estimators converge weakly to maximizers of Gaussian processes under rather general conditions. The conventional bootstrap method fails in general to consistently estimate the limit law. We show that the m out of n bootstrap, in this extended setting, is weakly consistent under conditions similar to those required for weak convergence of the M-estimators. The aim of this paper is therefore to extend the existing theory on the bootstrap of the M-estimators. Examples of applications from the literature are given to illustrate the generality and the usefulness of our results. Finally, we investigate the performance of the methodology for small samples through a short simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications

Article 31 July 2019

Xiaohui Chen & Kengo Kato

Blockwise bootstrap of the estimated empirical process based on $$\psi $$ -weakly dependent observations

Article 13 June 2015

Barbara Wieczorek

Asymptotics for the conditional self-weighted M-estimator of GRCA(1) models with possibly heavy-tailed errors

Article 25 October 2019

Ke-Ang Fu, Ting Li, … Renshui Wu

References

Alin A, Martin MA, Beyaztas U, Pathak PK (2017) Sufficient m-out-of-n(m/n) bootstrap. J Stat Comput Simul 87(9):1742–1753
Article MathSciNet MATH Google Scholar
Allaire G (2005) Analyse numérique et optimisation: une introduction à la modélisation mathématique et à la simulation numérique. Editions Ecole (Polytechnique)
Alvarez-Andrade S, Bouzebda S (2013) Strong approximations for weighted bootstrap of empirical and quantile processes with applications. Stat Methodol 11:36–52
Article MathSciNet MATH Google Scholar
Alvarez-Andrade S, Bouzebda S (2015) On the local time of the weighted bootstrap and compound empirical processes. Stoch Anal Appl 33(4):609–629
Article MathSciNet MATH Google Scholar
Alvarez-Andrade S, Bouzebda S (2019) Some selected topics for the bootstrap of the empirical and quantile processes. Theory Stoch Process 24(1):19–48
MathSciNet MATH Google Scholar
Arcones MA, Giné E (1992) On the bootstrap of M-estimators and other statistical functionals. In Exploring the limits of bootstrap (East Lansing, MI, 1990), Wiley Ser Probab Math Statist Probab. Math. Statist., pages 13–47. Wiley, New York
Bickel PJ, Sakov A (2008) On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statist Sinica 18(3):967–985
MathSciNet MATH Google Scholar
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD
Bickel PJ, Götze F, van Zwet WR (1997) Resampling fewer than n observations: gains, losses, and remedies for losses. Statist Sinica 7(1), 1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995)
Bose A, Chatterjee S (2001) Generalised bootstrap in non-regular M-estimation problems. Statist Probab Lett 55(3):319–328
Article MathSciNet MATH Google Scholar
Bouzebda S (2010) Bootstrap de l’estimateur de Hill: théorèmes limites. Ann ISUP 54(1-2), 61–72
Bouzebda S, Limnios N (2013) On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J Multivariate Anal 116:52–62
Article MathSciNet MATH Google Scholar
Bouzebda S, Papamichail C, Limnios N (2018) On a multidimensional general bootstrap for empirical estimator of continuous-time semi-Markov kernels with applications. J Nonparametr Stat 30(1):49–86
Article MathSciNet MATH Google Scholar
Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71(5):1591–1608
Article MathSciNet MATH Google Scholar
Cheng G, Huang JZ (2010) Bootstrap consistency for general semiparametric M-estimation. Ann Statist 38(5):2884–2915
Article MathSciNet MATH Google Scholar
Chernick MR (2008) Bootstrap methods: a guide for practitioners and researchers. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition
Datta S, McCormick WP (1995) Bootstrap inference for a first-order autoregression with positive innovations. J Amer Statist Assoc 90(432):1289–1300
Article MathSciNet MATH Google Scholar
Davison AC, Hinkley DV (1997) Bootstrap methods and their application, volume 1 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. With 1 IBM-PC floppy disk (3.5 inch; HD)
Delsol L, Van Keilegom I (2020) Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Statist Math 72(2):577–605
Article MathSciNet MATH Google Scholar
Dudley RM (1999) Uniform central limit theorems, volume 63 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Statist 7(1):1–26
Article MathSciNet MATH Google Scholar
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York
El Bantli F (2004) M-estimation in linear models under nonstandard conditions. J Statist Plann Inference 121(2):231–248
Article MathSciNet MATH Google Scholar
Giné E, Zinn J (1989) Necessary conditions for the bootstrap of the mean. Ann Statist 17(2):684–691
Götze F, Račkauskas A (2001) Adaptive choice of bootstrap sample sizes. In State of the art in probability and statistics (Leiden, 1999), volume 36 of IMS Lecture Notes Monogr. Ser., pages 286–309. Inst. Math. Statist., Beachwood, OH
Hall P (1992) The bootstrap and Edgeworth expansion. Springer Series in Statistics. Springer-Verlag, New York
Hall P, Horowitz JL, Jing B-Y (1995) On blocking rules for the bootstrap with dependent data. Biometrika 82(3):561–574
Article MathSciNet MATH Google Scholar
Hoffmann-Jørgensen J (1991) Stochastic processes on Polish spaces. Various publications series. Aarhus Universitet, Matematisk Institut
MATH Google Scholar
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econometrics 58(1–2):71–120
Article MathSciNet MATH Google Scholar
Kim J, Pollard D (1990) Cube root asymptotics. Ann Statist 18(1):191–219
Article MathSciNet MATH Google Scholar
Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer Series in Statistics, Springer, New York
Book MATH Google Scholar
Koul HL, Müller UU, Schick A et al (2012) The transfer principle: a tool for complete case analysis. Ann Stat 40(6):3031–3049
Article MathSciNet MATH Google Scholar
Kristensen D, Salanié B (2017) Higher-order properties of approximate estimators. J Econometrics 198(2):189–208
Article MathSciNet MATH Google Scholar
Lahiri SN (1992) On bootstrapping M-estimators. Sankhyā Ser A 54(2):157–170
MathSciNet MATH Google Scholar
Lee SMS (2012) General M-estimation and its bootstrap. J Korean Statist Soc 41(4):471–490
Article MathSciNet MATH Google Scholar
Lee SMS, Pun MC (2006) On m out of n bootstrapping for nonstandard M-estimation with nuisance parameters. J Amer Statist Assoc 101(475):1185–1197
Article MathSciNet MATH Google Scholar
Lee SMS, Yang P (2020) Bootstrap confidence regions based on M-estimators under nonstandard conditions. Ann Statist 48(1):274–299
Article MathSciNet MATH Google Scholar
Ma S, Kosorok MR (2005) Robust semiparametric m-estimation and the weighted bootstrap. J Multivar Anal 96(1):190–217
Article MathSciNet MATH Google Scholar
Müller UU et al (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37(5A):2245–2277
Article MathSciNet MATH Google Scholar
Pakes A, Olley S (1995) A limit theorem for a smooth class of semiparametric estimators. J. Econometrics 65(1):295–332
Article MathSciNet MATH Google Scholar
Pakes A, Pollard D (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57(5):1027–1057
Article MathSciNet MATH Google Scholar
Pérez-González A, Vilar-Fernández JM, González-Manteiga W (2009) Asymptotic properties of local polynomial regression with missing data and correlated errors. Ann Inst Stat Math 61(1):85–109
Article MathSciNet MATH Google Scholar
Pfanzagl J (1990) Estimation in semiparametric models, vol 63. Lecture Notes in Statistics. Springer-Verlag, New York, Some recent developments
Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer Series in Statistics. Springer-Verlag, New York
Pollard D (1985) New ways to prove central limit theorems. Economet Theor 1(3):295–313
Article Google Scholar
Shao J, Tu DS (1995) The jackknife and bootstrap. Springer Series in Statistics. Springer-Verlag, New York
Swanepoel JWH (1986) A note on proving that the (modified) bootstrap works. Comm Statist A-Theory Methods 15(11):3193–3203
Article MathSciNet MATH Google Scholar
van de Geer SA (2000) Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics
Wei B, Lee SMS, Wu X (2016) Stochastically optimal bootstrap sample size for shrinkage-type statistics. Stat Comput 26(1–2):249–262
Article MathSciNet MATH Google Scholar
Wellner JA, Zhan Y (1996) Bootstrapping Z-estimators. Preprint
Zhan Y (2002) Central limit theorems for functional Z-estimators. Statist. Sinica 12(2):609–634
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are indebted to the Editor-in-Chief, Associate Editor and the referee for their very valuable comments, suggestions careful reading of the article which led to a considerable improvement of the manuscript. The Third author gratefully acknowledges the funding received towards his PhD from the Algerian government PhD fellowship.

Author information

Authors and Affiliations

LMAC (Laboratory of Applied Mathematics of Compiègne), Université de technologie de Compiègne, Compiègne, France
Salim Bouzebda, Issam Elhattab & Anouar Abdeldjaoued Ferfache
École Nationale de Commerce et de Gestion (ENCG), Hassan II University of Casablanca, Casablanca, Morocco
Issam Elhattab

Authors

Salim Bouzebda
View author publications
You can also search for this author in PubMed Google Scholar
Issam Elhattab
View author publications
You can also search for this author in PubMed Google Scholar
Anouar Abdeldjaoued Ferfache
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salim Bouzebda.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Applications

We present in this section some examples which can not handled with the classical theory of semiparametric estimators and their m out of n bootstrap version cannot be applied while theory of the paper can be applied. This illustrate the usefulness of our results. Delsol and Van Keilegom (2020) provided some examples of situations in which the existing theory on semiparametric estimators cannot be applied, whereas their result could be applied. It worth noticing that the aim of this section is to verify the bootstrap conditions that are different from those used for the non bootstrapped estimators checked in the last mentioned reference. Although only three examples will be given here, they stand as archetypes for a variety of models that can be investigated by the methodology of the present paper.

1.1 Single Index Model with Monotone Link Function

The single index regression models are typical examples which are given

$$\begin{aligned} Y=g\left( \mathbf {X}^{\top } \varvec{\beta }\right) +\varepsilon \end{aligned}$$

(22)

where $\mathbb {P}(\varepsilon | \mathbf{X})=0, {\text {Var}}(\varepsilon | \mathbf{X})<\infty$ and we assume that the unknown function $g(\cdot )$ is monotone, we refer to Ichimura (1993) for more details. On the basis of the sample $\left( \mathbf{X}_{1}, Y_{1}\right) , \ldots ,\left( \mathbf{X}_{n}, Y_{n}\right)$ coming from the model (22), we make use of the the pool-adjacent-violators algorithm to constrcut and estimator of the function $g(\cdot )$. This gives a non-smooth estimator $\widehat{g}_{\varvec{\beta }}(\cdot )$ of $g_{\varvec{\beta }}(\mathbf {z})=\mathbb E\left[ Y |\mathbf{X}^{\top } \varvec{\beta }=\mathbf {z}\right] .$ Next, by using the least-squares estimation method we estimate $\varvec{\beta }$

$$\widehat{\varvec{\beta }}=\arg \max _{\varvec{\beta }}\left[ -n^{-1} \sum _{i=1}^{n}\left( Y_{i}-\widehat{g}_{\varvec{\beta }}\left( \mathbf{X}_{i}^{T} \varvec{\beta }\right) \right) ^{2}\right] .$$

By the fact that $\widehat{g}_{\varvec{\beta }}(\cdot )$ is of non-smooth nature implies that the criterion function is not smooth in $\varvec{\beta }$. This is a situation where the theory of the present paper can be applied.

1.2 Classification with Missing Data

Let $\mathbf {X}_{1}=(\mathbf {X}_{11},\mathbf {X}_{12}),\ldots ,\mathbf {X}_{n}=(\mathbf {X}_{n1},\mathbf {X}_{n2})$ be independent and identically distributed random copies of the random vector $\mathbf {X}=(\mathbf {X}_{1},\mathbf {X}_{2})$, coming from two underlying populations. For $j=0,1$, let $\mathbf {Y}_{i}=j$ when the $\mathbf{X}_i$ comes from the population j. Let us denote by $\mathbf {Y}$ the population indicator associated with the vector $\mathbf {X}$. Using the information of available data, we seek to find a classification method for novel observations with unknown true population.

The classification is performed by regressing $\mathbf {X}_{2}$ on $\mathbf {X}_{1}$ making use of the parametric criterion function $f_{\varvec{\theta }}(\cdot )$, and choosing $\varvec{\theta }$ that maximize the following

$$\begin{aligned} \mathbb {P}\mathrm{1\!I}_{ \{\mathbf {Y}=1, \mathbf {X}_{2} \ge f_{\varvec{\theta }}(\mathbf {X}_{1})\}}+\mathbb {P}\mathrm{1\!I}_{ \{\mathbf {Y}=0, \mathbf {X}_{2} < f_{\varvec{\theta }}(\mathbf {X}_{1})\}}. \end{aligned}$$

(23)

Let $\varvec{\theta }_{0}$ denote the maximizer of (23) with respect to all $\varvec{\theta } \in \varvec{\Theta }$, here $\varvec{\Theta }$ is assumed to be a compact subset of $\mathbb {R}^{k}$ containing as an interior point $\varvec{\theta }_{0}$. Now assume that $\mathbf {Y}_{i}$’s are subject to some missing mechanism. Let $\Delta _{i}$be a random variable (respectively $\Delta$) equals to 1 when we observe the random variable $\mathbf {Y}_{i}$ (respectively $\mathbf {Y}$), and 0 otherwise. Let $\mathbf {Z}_{1}=(\mathbf {X}_{1},\mathbf {Y}_{1}\Delta _{1},\Delta _{1}),\ldots , \mathbf {Z}_{n}=(\mathbf {X}_{n},\mathbf {Y}_{n}\Delta _{n},\Delta _{n})$ be the observations at hand. The missing at random mechanism in considered in the following sense

$$\begin{aligned} \mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}}|\mathbf {X}_{1},\mathbf {X}_{2},\mathbf {Y}\right) =\mathbb {P}\left( \mathrm{1\!I}_{\{ \Delta =1\}}|\mathbf {X}_{1}\right) :=p^{0}\left( \mathbf {X}_{1}\right) . \end{aligned}$$

Note that the relation (23) can be written

$$\mathbb {E}\left[ \frac{\mathrm{1\!I}_{ \{\Delta =1\}}}{p^{0}(\mathbf {X}_{1})}\left\{ \mathrm{1\!I}_{ \{\mathbf {Y}=1, \mathbf {X}_{2} \ge f_{\varvec{\theta }}(\mathbf {X}_{1})\}}+\mathrm{1\!I}_{ \{\mathbf {Y}=0, \mathbf {X}_{2} < f_{\varvec{\theta }}(\mathbf {X}_{1})\}}\right\} \right] .$$

We define

$$\mathbf {m}_{{\varvec{\theta }},p}(\mathbf {Z})=\frac{\mathrm{1\!I}_{ \{\Delta =1\}}}{p(\mathbf {X}_{1})}\left\{ \mathrm{1\!I}_{ \{\mathbf {Y}=1, \mathbf {X}_{2} \ge f_{\varvec{\theta }}(\mathbf {X}_{1})\}}+\mathrm{1\!I}_{ \{\mathbf {Y}=0, \mathbf {X}_{2} < f_{\varvec{\theta }}(\mathbf {X}_{1})\}}\right\} ,$$

here the infinite dimensional nuisance parameter $p(\cdot )$ belonging to some functional space $\mathcal {P}$ to be specified later. Consequently, the estimator $\varvec{\theta }_{n}$ of $\varvec{\theta }_{0}$ is given by

$$\varvec{\theta }_{n}=\arg \max \limits _{\varvec{\theta } \in \theta }\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{p}},$$

where, for any x and a bandwidth sequence $h=h_{n}$,

$$\widehat{p}(x)=\sum \limits _{i=1}^{n}\frac{K_{h}(x-\mathbf {X}_{i1})}{\sum \limits _{j=1}^{n}K_{h}(x-\mathbf {X}_{j1})}\mathrm{1\!I}_{ \{\Delta _{i}=1\}},$$

where the kernel function $K(\cdot )$ is assumed to be a density function having support $[-1,1]$, $K_{h}(u)=\frac{K\left( \frac{u}{h}\right) }{h}$. Non parametric regression with missing have long attracted a great deal of attention, for good sources of references to research literature in this area along with statistical applications consult Müller (2009), Pérez-González et al. (2009) and Koul et al. (2012) among many others.

1.3 Binary Choice Model with Missing Data

Let us define the binary choice model, in the linear regression function framework, by

$$\left\{ \begin{array}{ll} U &{}=\mathbf{X}^{\top } \varvec{\beta }-\varepsilon , \\ Y &{}=\mathrm{1\!I}(U \ge 0), \end{array}\right.$$

where we assume that $\varepsilon$ is zero median conditionally on $\mathbf{X} .$ The random variable Y is missing at random with the probability, to observe Y, depending on $\mathbf{X}$ via the following relation

$$\mathbb {P}(\mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf{X}, Y)=\mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf{X}^{\top } \gamma \right) :=p\left( \mathbf{X}^{\top } \gamma \right) ,$$

where $\Delta =1$ when we observe Y and 0 elsewhere. The observed data for the preceding model are given by of i.i.d. triplets $\left( \mathbf {X}_{1}, Y_{1} \Delta {1}, \Delta _{n}\right) ,\ldots ,\left( \mathbf {X}_{n}, Y_{n} \Delta _{n}, \Delta _{n}\right)$. To estimate $p_{\gamma }(z)=$ $\mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf {X}^{\top } \varvec{\gamma }=z\right) ,$ we use the following

$$\widehat{p}_{\varvec{\gamma }}(z)=\sum _{i=1}^{n} \frac{\displaystyle K_{h}\left( \mathbf {X}_{i}^{\top } \varvec{\gamma }-z\right) }{\displaystyle \sum _{j=1}^{n} K_{h}\left( \mathbf{X}_{j}^{\top } \varvec{\gamma }-z\right) }\mathrm{1\!I}_{ \{\Delta _i=1\}}.$$

The parameter estimate is given by

$$(\widehat{\varvec{\beta }}, \widehat{\varvec{\gamma }})=\arg \max \limits _{\varvec{\beta }, \varvec{\gamma }} \mathbb {P}_{n}\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, \widehat{p}_{\varvec{\gamma }}},$$

where

$$\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, p}= \frac{\mathrm{1\!I}_{ \{\Delta _{i}=1\}}}{p\left( \mathbf{X}_{i}^{\top } \varvec{\gamma }\right) }\left[ 2 \mathrm{1\!I}_{ \{Y_{i}=1\}}-1\right] \mathrm{1\!I}_{ \{\mathbf {X}_{i}^{\top } \varvec{\beta } \ge 0\}}.$$

The existing theory cannot be applied here by the fact that the function $\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, p}$ is smooth in $\varvec{\gamma }$ but non-smooth in $\varvec{\beta }$.

Now we will study in full detail the example in Sect. 7.2 and we work out the verification of the conditions of Theorems 3.2, 3.5, 3.8 and 3.9 the most of this conditions verified in Sect. 7 of Delsol and Van Keilegom (2020) by noting that $\nu =2$ and $\ell \equiv 1$, so our focuses is to verify the conditions needed for the m out of n bootstrapped version. In the beginning we give some information about the nuisance function and her space and some notation. Let $\mathcal {P}$ be the space of functions $p:\mathbf {R}_{\mathbf {X}_{1}}\rightarrow \mathbb {R}$ that are continuously differentiable, for which

$$\sup \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}p(\mathbf {x}_{1}) \le M < \infty , \sup \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}|p^{'}(\mathbf {x}_{1})| \le M ~ \text{ and } ~\inf \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}p(\mathbf {x}_{1})>\eta /2,$$

where

$$\eta =\inf \limits _{\mathbf {x}_{1}\in \mathbf {R}_{X_{1}}}p^{0}(\mathbf {x}_{1})>0$$

and $\mathbf {R}_{\mathbf {X}_{1}}$ is the support of $\mathbf {X}_{1}$, where we suppose it is a compact subspace of $\mathbb {R}$. We equip the space $\mathcal {P}$ with the supremum norm:

$$d_{\mathcal {P}}(p_{1},p_{2})=\sup \limits _{\mathbf {x}_{1}\in \mathbf {R}_{\mathbf {X}_{1}}}|p_{1}(\mathbf {x}_{1})-p_{2}(\mathbf {x}_{1})|~ \text{ for } \text{ any } ~p_{1},p_{2} \in \mathcal {P}.$$

After, the conditions of the consistency are verified as follows, (A1) holds true provided the functions $p_{0}(\cdot )$ and $K(\cdot )$ are continuously differentiable. For assumption (A2) we can showing that the bracketing number of the class $\mathcal {F}=\{\mathbf {m}_{\varvec{\theta },p}, \varvec{\theta }\in \Theta , p\in \mathcal {P}\}$; $N_{[~]}\left( \epsilon , \mathcal {F}, \mathbb {L}_{\mathbb {P}}\right)$ is finite for all $\epsilon >0$, by using Corollary 2.7.2 of van der Vaart and Wellner (1996), we get

$$\begin{aligned} N_{[~]}\left( \epsilon , \mathcal {P}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\}, \end{aligned}$$

(24)

and

$$N_{[~]}\left( \epsilon , \{f_{\varvec{\theta }}, \varvec{\theta }\in \varvec{\Theta }\}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\},$$

by the properties of the set $\mathcal {P}$ and the fact that $\mathbf {x}\mapsto f_{\varvec{\theta }}(\mathbf {x})$ is continuously differentiable over $\varvec{\theta }$ with bounded derivative and as a consequence it’s easily to show that

$$\begin{aligned} N_{[~]}\left( \epsilon , \mathcal {T}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\}, \end{aligned}$$

(25)

for the class $\mathcal {T}=\left\{ \left( \mathbf {x}_{1}, \mathbf {x}_{2}\right) \rightarrow \mathrm{1\!I}_{\{x_{2} \ge f_{\varvec{\theta }}\left( \mathbf {x}_{1}\right) \}}: \varvec{\theta } \in \varvec{\Theta }\right\}$. From (24) and (25) we get;

$$\begin{aligned} N_{[~]}\left( \epsilon , \mathcal {F}, \mathbb {L}_{\mathbb {P}}\right) \le \exp \{\mathfrak K\epsilon ^{-1}\}. \end{aligned}$$

Then assumption (A3) is straightforward. Assumption (A4) is an identifiability condition to ensure the uniqueness of $\varvec{\theta }_{0}$ and (A5) is verified by construction of the estimator $\varvec{\theta }_{n}$. The consistency of $\varvec{\theta }_{n}$ is then follows. For the conditions of the bootstrap version they are verified as follows; fist part of assumption (AB1) is satisfied by definition of the m out of n bootstrap, where the second part in this situation follows directly by noting that if $r_{n}=n^{\kappa }$, we get $r_{m}=m^{\kappa }$ for some $\kappa >0$, by consequent we have $r^{2}_{m}=o(r^{2}_{n})$. For (AB2) as mentioned in remark 3.1(v) we take $\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )$ where we replace the variables $\mathbf {X}_{1i}$ and $\Delta _{i}$ by $\mathbf {X}^{*}_{1i}$ and $\Delta ^{*}_{i}$ respectively in $\widehat{p}(\cdot )$; i.e.,

$$\widehat{p}_{m}(x_{1})=\sum \limits _{i=1}^{m}\frac{K_{h}(x_{1}-\mathbf {X}^{*}_{i1})}{\sum \limits _{j=1}^{m}K_{h}(x_{1}-\mathbf {X}^{*}_{j1})}\mathrm{1\!I}_{ \{\Delta ^{*}_{i}=1\}},$$

we remark that

$$\mathbb {P}_{W}\left( \frac{1}{m}\sum \limits _{i=1}^{m}K_{h}(x_{1}-\mathbf {X}^{*}_{i1})\mathrm{1\!I}_{ \{\Delta ^{*}_{i}=1\}}\right) =\frac{1}{n}\sum \limits _{i=1}^{n}K_{h}(x_{1}-\mathbf {X}_{i1})\mathrm{1\!I}_{ \{\Delta _{i}=1\}},$$

which implies $d_{\mathcal {H}}\left( \widehat{p}_{m},\widehat{p}\right) =o_{\mathbb {P}^{*}_{W}}(1)$ i.p. By the triangular inequality we get

$$d_{\mathcal {H}}\left( \widehat{p}_{m},p^{0}\right) \le d_{\mathcal {H}}\left( \widehat{p}_{m},\widehat{p}\right) +d_{\mathcal {H}}\left( \widehat{p},p^{0}\right) =o_{\mathbb {P}^{*}_{W}}(1),~~ \text{ i.p. }$$

(AB3) is verified by construction of the estimator $\varvec{\theta }_{m}$. Which implies the consistency of $\varvec{\theta }_{m}$. Next for the rate of convergence we show only conditions (B2) and (B3). For (B2), it suffices by remark 3.3(ii) to show (4) and (5). For that by uses of the relation between covering and bracketing numbers and Corollary 2.7.2 of van der Vaart and Wellner (1996) we get that

$$\log N\left( \epsilon \left\| M_{\delta , \delta _{1}^{\prime }}\right\| _{\mathbb {L}_{2}\left( \mathbb {Q}\right) }, \mathcal {M}_{\delta , \delta _{1}^{\prime }}, \mathbb {L}_{2}(\mathbb {Q})\right) \le \exp \{\mathfrak K\epsilon ^{-1}\},$$

for every probability measure $\mathbb {Q}$ on $\mathbb {R}^{4}$, which implies our relation in (4), (5) is verified by the choice $\varphi (\delta )=\sqrt{\delta }$ as consequence we get (B2). For (B3), it follows directly like in Sect. 7 of the same reference which described this example and by the choice of the two functions $\varvec{\psi }_{1}(\cdot )$ and $\varvec{\psi }_{2}(\cdot )$ given in Remark 3.3(iii), which implies (B3). By their discussion for the rates $r_{n}$, $v_{n}$ and the bandwidth h of the kernel; it follows

$$\varvec{\theta }_{n}-\varvec{\theta }_{0}=O_{\mathbb {P}^{*}}\left( n^{-1/3}\right) .$$

We verify the assumption (BB1) as in the verification of condition (AB2) by choosing $\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )$ we get $v^{-1}_{m}=\sqrt{\frac{\log m}{mh}}+h$, where $h=h_{m}.$ Assumption (BB2) holds by the same argument given for (B2). For assumption (BB3), we check conditions (b)-(d) of Remark 3.3(iii). We obtain

$$\begin{aligned} \Gamma \left( \varvec{\theta }_{0}, p\right) =\mathbb {P}\left[ \frac{p_{0}\left( \mathbf {X}_{1}\right) }{p\left( \mathbf {X}_{1}\right) }\left\{ 1-2 \mathbb {P}\left( \mathrm{1\!I}_{ \{\mathbf {Y}=1\}} | \mathbf {X}_{1}, \mathbf {X}_{2}\right) \right\} f_{\mathbf {X}_{2} | \mathbf {X}_{1}}\left( f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) \frac{\partial }{\partial \theta } f_{\theta _{0}}\left( \mathbf {X}_{1}\right) \right] , \end{aligned}$$

(26)

and

$$\begin{aligned} \begin{aligned} \Lambda \left( \varvec{\theta }_{0}, p\right) =&\mathbb {P}\left[ \frac{p_{0}\left( \mathbf {X}_{1}\right) }{p\left( \mathbf {X}_{1}\right) }\left\{ 1-2 \mathbb {P}\left( \mathrm{1\!I}_{ \{\mathbf {Y}=1\}} | \mathbf {X}_{1}, \mathbf {X}_{2}\right) \right\} \left\{ f_{\mathbf {X}_{2} | \mathbf {X}_{1}}^{\prime }\left( f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) \left( \frac{\partial }{\partial \varvec{\theta }} f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) ^{2}\right. \right. \\&\left. \left. +f_{\mathbf {X}_{2} | \mathbf {X}_{1}}\left( f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right) \frac{\partial ^{2}}{\partial \varvec{\theta }^{2}} f_{\varvec{\theta }_{0}}\left( \mathbf {X}_{1}\right) \right\} \right] , \end{aligned} \end{aligned}$$

provided the derivatives in $\Lambda \left( \varvec{\theta }_{0}, p\right)$ all exist. By the definition of maximum it follows that $\Gamma \left( \varvec{\theta }_{0}, p^{0}\right) =0$ and $\Lambda \left( \varvec{\theta }_{0}, p^{0}\right)$ is negative. Noting that

$$\Vert \Gamma \left( \varvec{\theta }_{0}, \widehat{p}_{m}\right) \Vert =O_{\mathbb {P}_{W}}(r^{-1}_{m})~\text {i.p.}$$

if $r_{m}$ satisfies

$$r_{m}\left( m^{-1/2}+h_{m}+\frac{\log m}{mh_{m}}\right) =O(1),$$

by noting that the expectation in (26) is taken with respect to $\mathbf {Z}$ and $\mathbf {W}$ when we are working with $\widehat{p}_{m}$, since our function are measurable, we obtain such result by applying Fubini’s Theorem. This condition on $r_{m}$ and the other given in (BB2) which is satisfied for $r_{m}=O(m^{1/3})$ are reconcilable provided

$$mh_{m}^{3}=O(1)~~ \text{ and } ~~\frac{(\log m)^{3/2}}{mh_{m}^{3/2}}=O(1).$$

Note that if we assume that $p^{0}(\cdot )$ is twice continuously differentiable we can weaken the first condition to $mh_{m}^{6}=O(1)$, as a consequence we get the $v^{-1}_{m}$ of $\widehat{p}_{m}$ would be $O\left( \sqrt{\frac{\log m}{mh_{m}}}+h^{2}_{m}\right)$, which is faster than $r^{-1}_{m}=m^{-1/3}$ of $\varvec{\theta }_{m}$ provided $mh_{m}^{3} \longrightarrow \infty$. The level of complexity of the latter case is less than the case where $p^{0}$ is only once differentiable, And we do not discuss it any further, therefore. We conclude that,

$$\varvec{\theta }_{m}-\varvec{\theta }_{0}=O_{\mathbb {P}^{*}_{W}}(m^{-1/3})~\text {i.p.}$$

Finally, for the weak convergence of $\varvec{\theta }_{n}$, we note that our assumptions (C4) is satisfied for $j_{n}=\sqrt{n}$ like in Remark3.5 (iii) and (C9) hold similarly to (B2). By consequence $n^{1/3}(\varvec{\theta }_{n}-\varvec{\theta }_{0})$ converge weakly, where assumption (CB1) follows from part (ii) of Theorem 3.5 and condition (BB1), by similar proof of condition (BB2) we get (CB2). We get from Remark 3.3 (iii), (vi) and Remark 3.5 (viii) that assumption (CB3) holds, provided that

$$|\Lambda (\varvec{\theta }_{0},p^{0})|<\infty .$$

Clearly we have for some positive constant $c>0$ that $m^{-1/3} < C$. For assumption (CB4), we have

$$r_{m}W_{m}(\gamma )=r_{m}\Gamma (\varvec{\theta }_{0},\widehat{p}_{m})\gamma =o_{\mathbb {P}_{W}}(1)~\text {i.p.},$$

provided $mh^{3}_{m}=o(1)$ and $\frac{\log ^{3/2}m}{mh^{3/2}_{m}}=o(m^{-1/2})$, by using what we discuss already for (BB3). Next, by the result given to the process in (16) i.e., the process $\gamma \mapsto \mathbb {G}_{n}\frac{r^{2}_{m}}{\sqrt{m}}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}$ converges weakly to the process $\mathbb {G}(\gamma )$ and condition (AB1), we get

$$\begin{aligned} r^{2}_{m}\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}&=r^{2}_{m}\left[ \left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}+\sqrt{\frac{m}{n}}\mathbb {G}_{n}\frac{\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}}{\sqrt{m}}+\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}\right] \\&=r^{2}_{m}\left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}+\frac{1}{2}\Lambda (\varvec{\theta }_{0},p^{0})\gamma ^{2}+o_{\mathbb {P}}(1), \end{aligned}$$

with $\Gamma (\varvec{\theta }_{0},p^{0})=0$ and

$$\Lambda (\gamma )=\frac{1}{2}\Lambda (\varvec{\theta }_{0},p^{0})\gamma ^{2}.$$

The process $\gamma \rightarrow r^{2}_{m}\left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}$ are the same given in Lee (2012) where there is no presence of nuisance parameter. Hence, we can follow the same steps given in Lemma 1 of Lee (2012) and get the convergence of the marginals using Lindeberg’s condition and some regularity assumption on $f_{\mathbf {X}_{1}/\mathbf {X}_{2}}$ and $\varvec{\theta }\mapsto f_{\varvec{\theta }}$. By construction of the estimator $\varvec{\theta }_{m}$, condition (CB5) follows. Then we get the asymptotic distribution of $r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{n})$ from part (ii) of Theorem 3.9.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouzebda, S., Elhattab, I. & Ferfache, A.A. General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters. Methodol Comput Appl Probab 24, 2961–3005 (2022). https://doi.org/10.1007/s11009-022-09965-y

Download citation

Received: 21 March 2021
Revised: 04 January 2022
Accepted: 25 May 2022
Published: 16 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11009-022-09965-y

Keywords

AMS Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters

Abstract

Access this article

Similar content being viewed by others

Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications

Blockwise bootstrap of the estimated empirical process based on $$\psi $$ -weakly dependent observations

Asymptotics for the conditional self-weighted M-estimator of GRCA(1) models with possibly heavy-tailed errors

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Appendix: Applications

1.1 Single Index Model with Monotone Link Function

1.2 Classification with Missing Data

1.3 Binary Choice Model with Missing Data

Rights and permissions

About this article

Cite this article

Keywords

AMS Subject Classification

Navigation

General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters

Abstract

Access this article

Similar content being viewed by others

Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications

Blockwise bootstrap of the estimated empirical process based on $$\psi $$ -weakly dependent observations

Asymptotics for the conditional self-weighted M-estimator of GRCA(1) models with possibly heavy-tailed errors

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Appendix: Applications

Appendix: Applications

1.1 Single Index Model with Monotone Link Function

1.2 Classification with Missing Data

1.3 Binary Choice Model with Missing Data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS Subject Classification

Search

Navigation