Adaptive nonparametric estimation of a component density in a two-class mixture model
Introduction
The following mixture model with two components: where the mixing proportion and the probability density function on are unknown, is considered in this article. It is assumed that independent and identically distributed (i.i.d. in the sequel) random variables drawn from density are observed. The main goal is to construct an adaptive estimator of the nonparametric component and to provide non-asymptotic upper bounds of the pointwise risk: the resulting estimator should automatically adapt to the unknown smoothness of the target function. The challenge arises from the fact that there is no direct observation coming from . As an intermediate step, the estimation of the parametric component is addressed as well.
Model (1) appears in some statistical settings: robust estimation and multiple testing among others. The one chosen in the present article, as described above, comes from the multiple testing framework, where a large number of independent hypotheses tests are performed simultaneously. -values generated by these tests can be modelled by (1). Indeed these are uniformly distributed on under null hypotheses while their distribution under alternative hypotheses, corresponding to , is unknown. The unknown parameter is the asymptotic proportion of true null hypotheses. It can be needed to estimate , especially to evaluate and control different types of expected errors of the testing procedure, which is a major issue in this context. See for instance Genovese and Wasserman (2002), Storey (2002), Langaas et al. (2005), Robin et al. (2007), Strimmer (2008), Nguyen and Matias (2014a), and more fundamentally, Benjamini and Hochberg (1995) and Efron et al. (2001).
In the setting of robust estimation, different from the multiple testing one, model (1) can be thought of as a contamination model, where the unknown distribution of interest is contaminated by the uniform distribution on , with the proportion . This is a very specific case of the Huber contamination model (Huber, 1965). The statistical task considered consists in robustly estimating from contaminated observations . But unlike our setting, the contamination distribution is not necessarily known while the contamination proportion is assumed to be known, and the theoretical investigations aim at providing minimax rates as functions of both and . See for instance the preprint of Liu and Gao (2019), which addresses pointwise estimation in this framework.
Back to the setting of multiple testing, the estimation of in model (1) has been addressed in several works. Langaas et al. (2005) proposed a Grenander density estimator for , based on a nonparametric maximum likelihood approach, under the assumption that belongs to the set of decreasing densities on . Following a similar approach, Strimmer (2008) also proposed a modified Grenander strategy to estimate . However, the two aforementioned papers do not investigate theoretical features of the proposed estimators. Robin et al. (2007) and Nguyen and Matias (2014a) proposed a randomly weighted kernel estimator of , where the weights are estimators of the posterior probabilities of the mixture model, that is, the probabilities of each individual being in the nonparametric component given the observation . Robin et al. (2007) propose an EM-like algorithm, and prove the convergence to a unique solution of the iterative procedure, but they do not provide any asymptotic property of the estimator. Note that their model , where is a known density, is slightly more general, but our procedure is also suitable for this model under some assumptions on . Besides, Nguyen and Matias (2014a) achieve a nonparametric rate of convergence for their estimator, where is the smoothness of the unknown density . However, their estimation procedure is not adaptive since the choice of their optimal bandwidth still depends on .
In the present work, a complete inference strategy for both and is proposed. For the nonparametric component , a new randomly weighted kernel estimator is provided with a data-driven bandwidth selection rule. Theoretical results on the whole estimation procedure, especially adaptivity of the selection rule to unknown smoothness of , are proved under a given identifiability class of the model, which is an original contribution in this framework. Major results derived in this paper are the oracle-type inequality in Theorem 1, and the rates of convergence over Hölder classes, which are adapted to the control of pointwise risk of kernel estimators, in Corollary 1.
Unlike the usual approach in mixture models, the weights of the proposed estimator are not estimates of the posterior probabilities. The proposed alternative principle is simple and consists in using weights based on a density change, from the target distribution , which is not directly reachable, to the distribution of observed variables . A function is thus derived such that , for all . This type of link between one of the conditional distribution given hidden variables, , to the distribution of observed variables , is quite remarkable in the framework of mixture models. It is a key idea of our approach, since it implies a crucial equation for controlling the bias term of the risk, see Section 2.1 for more details. This is necessary to investigate adaptivity using the Goldenshluger and Lespki (GL) approach (Goldenshluger and Lepski, 2011), which is known in other various contexts, see for instance, Comte et al. (2013), Comte and Lacour (2013), Doumic et al. (2012), Reynaud-Bouret et al. (2014) who apply GL method in kernel density estimation, and Bertin et al. (2016), Chagny (2013), Chichignoud et al. (2017) or Comte and Rebafka (2016).
Thus oracle weights are defined by , , but and are unknown. These oracle weights are estimated by plug-in, using preliminary estimators of and , based on an additional sample . Some assumptions on these estimators are needed to prove the results on the estimator of ; this paper also provides estimators of and which satisfy these assumptions. Note that procedures of Nguyen and Matias (2014a) and Robin et al. (2007) actually require preliminary estimates of and as well, but they do not deal with additional uncertainty caused by the multiple use of the same observations in the estimates of , and .
Identifiability issues are reviewed in Section 1.1 in Nguyen and Matias (2014b). In the present work, is assumed to be vanishing at a neighbourhood of to ensure identifiability. Under this assumption, can be recovered as the infimum of . Moreover, as shown above by the equation linking to and , is actually uniquely determined by giving and , even though the latter is not the infimum of . Note that the theoretical results on the estimator of the nonparametric component do not depend on the chosen identifiability class, and can be transposed to other cases. For that reason, the discussion on identifiability is postponed to Section 4.2, after results on the estimator of .
The paper is organized as follows. Our randomly weighted estimator of is constructed in Section 2.1. Assumptions on and on preliminary estimators of and required for proving the theoretical results are in this section too. In Section 2, a bias–variance decomposition for the pointwise risk of the estimator of is given as well as the convergence rate of the kernel estimator with a fixed bandwidth. In Section 3, an oracle inequality is given, which justifies our adaptive estimation procedure. Construction of the preliminary estimators of and is to be found in Section 4. Numerical results illustrate the theoretical results in Section 5. Proofs of theorems, propositions and technical lemmas are postponed to Section 6.
Section snippets
Collection of kernel estimators for the target density
In this section, a family of kernel estimators for the density function based on a sample of i.i.d. variables with distribution is defined. It is assumed that preliminary estimators of both the mixing proportion and the mixture density are available, and respectively denoted by and . They are defined from an additional sample of independent variables also drawn from but independent of the first sample . Definitions, results and results on
Adaptive pointwise estimation
Let be a finite family of possible bandwidths , whose cardinality is bounded by the sample size . The best estimator in the collection defined in (3) at the point is the one that have the smallest risk, or similarly, the smallest bias–variance decomposition. But since is unknown, in practice it is impossible to minimize over the r.h.s. of inequality (7) in order to select the best estimate. Thus, we propose a data-driven selection, with a rule in the spirit of
Estimation of the mixture density and the mixing proportion
This section is devoted to the construction of the preliminary estimators and , required to build (3). To define them, we assume that we observe an additional sample distributed with density function , but independent of the sample . We explain how estimators and can be defined to satisfy the assumptions described at the beginning of Section 2.2, and also how we compute them in practice. The reader should bear in mind that other constructions are
Simulated data
We briefly illustrate the performance of the estimation method over simulated data, according the following framework. We simulate observations with density defined by model (1) for sample size . Three different cases of are considered:
- •
, .
- •
with , .
- •
the density of truncated exponential distribution on with , .
The density is borrowed
Proofs
In the sequel, the notations , and respectively denote the probability, the expectation and the variance associated with , conditionally on the additional random sample .
Acknowledgements
We are very grateful to Catherine Matias for interesting discussions on mixture models. The research of the authors is partly supported by the French Agence Nationale de la Recherche (ANR-18-CE40-0014 projet SMILES) and by the French Région Normandie (projet RIN AStERiCs 17B01101GR). Finally, we gratefully acknowledge the referees for carefully reading the manuscript and for numerous suggestions that improved the paper.
References (30)
- et al.
A cross-validation based estimation of the proportion of true null hypotheses
J. Statist. Plann. Inference
(2010) - et al.
Nonparametric estimation for stochastic differential equations with random effects
Stochastic Process. Appl.
(2013) - et al.
Nonparametric weighted estimators for biased data
J. Statist. Plann. Inference
(2016) - et al.
A semi-parametric approach for mixture models: Application to local false discovery rate estimation
Comput. Stat. Data Anal.
(2007) - et al.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J. R. Stat. Soc. B
(1995) - et al.
Adaptive pointwise estimation of conditional density function
(2013) - et al.
Adaptive pointwise estimation of conditional density function
Ann. Inst. H. Poincaré Probab. Stat.
(2016) Two adaptive rates of convergence in pointwise density estimation
Math. Methods Stat.
(2000)Penalization versus Goldenshluger– Lepski strategies in warped bases regression
ESAIM Probab. Stat.
(2013)- et al.
Adaptive wavelet multivariate regression with errors in variables
Electron. J. Stat.
(2017)