Abstract
Empirical and kernel estimators are considered for the distribution of positive length biased data. Their asymptotic bias, variance and limiting distribution are obtained. For the kernel estimator, the asymptotically optimal bandwidth is calculated and rule of thumb bandwidths are proposed. At any point below the median, the asymptotic mean squared error of the kernel estimator is smaller than that of the empirical estimator. A suitably truncated kernel estimator is positive and we prove the strong uniform, and \(L_2\) consistency of this estimator. Simulations reveal the improved performance of the truncated kernel estimator in estimating tail probabilities based on length biased data.
Similar content being viewed by others
Notes
A non-negative valued r.v. is said to follow log-normal\((\mu ,\ \sigma ^2)\) distribution iff the natural logarithm of the random variable follows normal distribution with mean \(\mu \) and variance \(\sigma ^2\).
References
Altman N, Léger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46:195–214
Azzalini A (1981) Estimation of a distribution function and quantiles by a kernel method. Biometrika 68(1):326–328
Ajami M, Fakoor V, Jomhoori S (2013) Some asymptotic results of kernel density estimator in length-biased sampling. J Sci Islam 24(1):55–62
Borrajo MI, González-Manteiga W, Martinez-Miranda MD (2017) Bandwidth Selection for kernel density estimation with length biased data. J Nonparametr Stat 29(3):636–638. https://doi.org/10.1080/10485252.2017.1339309
Blumenthal S (1967) Proportional sampling in life length studies. Technometrics 9:205–218
Chow YS, Teicher H (1997) Probability theory, independence, interchangeability, martingales, 3rd edn. Springer, Berlin
Cox DR (1969) Some sampling problems in technology. In: Johnson NL, Smith H (eds) New developments in survey sampling. Wiley, New York, pp 506–27
Del Río AQ, Estévez-Pŕez G (2012) Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications. J Stat Softw 50(8):1–21
Dutta S (2015) Local smoothing for kernel distribution function estimation. Commun Stat Simul Comput 44:878–891
Efromovich S (2018) Missing and modified data in nonparametric estimation, with R examples. Chapman & Hall, London
Falk FY (1983) Relative efficiency and deficiency of kernel type estimators of distribution functions. Stat Neerl 37(2):73–83
Gill RD, Vardi Y, Wellner JA (1988) Large sample theory of empirical distributions in biased sampling models. Ann Statist 16(3):1069–1112
Horváh L (1985) Estimation from a length-biased distribution. Stat Decis 3:91–113
Huang CY, Qin J (2011) Nonparametric estimation for length-biased and right-censored data. Biometrika 98(1):177–186
Jones MC (1991) Kernel density estimation for length biased data. Biometrika 78(3):511–519
Liu R, Lijian Y (2008) Kernel estimation of multivariate cumulative distribution function. J Nonparametr Stat 20(8):661–677. https://doi.org/10.1080/10485250802326391
McFadden JA (1962) On the lengths of intervals in a stationary point process. J R Soc Ser B 24:364–382
Patil GP, Rao CR (1977) The weighted distributions: a survey of their applications. In: Krishnaiah PR (ed) Applications of statistics. North-Holland, Amsterdam, pp 383–405
Patil GP, Rao CR (1978) Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 34:179–189
Qin J (2017) Biased sampling. Over-identified parameter problems and beyond, Springer, Singapore
Reiss RD (1981) Nonparametric estimation of smooth distribution functions. Scand J Stat 8(2):116–119
Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York
Swanepoel JWH, Graan FCV (2005) A new kernel distribution function estimator based on a non-parametric transformation of the data. Scand Stat Theory Appl 32(4):551–562
Vardi Y (1982a) Nonparametric estimation in the presence of length bias. Ann Stat 10(2):616–620
Vardi V (1982b) Nonparametric estimation in renewal process. Ann Stat 10(3):772–785
Acknowledgements
We are deeply thankful to the referee for pointing out several mistakes and for offering constructive suggestions for improvement.
Author information
Authors and Affiliations
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A. Bose: Research supported by J.C. Bose National Fellowship, Govt. of India. S. Dutta: Research supported by the MATRICS scheme NO. MTR/2019/000502 of Science and Engineering Research Board (SERB), Govt. of India.
Appendix
Appendix
Proof
We can write
Under the stated assumptions \(P(D=E(D))=0\), and therefore \(P(r\ne 1)=1\). We know that for \(r\ne 1\),
It is easy to verify that
Multiplying both sides of (6.1) by \(\frac{N}{E(D)}\), taking expectation and using (6.2) we get the stated inequality. \(\square \)
Proof of Lemma 2
To prove Lemma 2 it is enough to see that
Using (6.3) and (6.4), it is straight forward to verify the expressions of \(Var\left[ \frac{1}{Y_1}I(Y_1\le y)\right] ,\ Cov\left[ \frac{1}{Y_1}I(Y_1\le y),\frac{1}{Y_1}\right] \) in Lemma 2. \(\square \)
Proof of Lemma 3
Using integration by parts we get
Therefore, as \(n\rightarrow \infty \)
Further under Assumptions (A1) to (A3) we get
Equations (6.7) and (6.8) imply that
\(\square \)
Using similar arguments we get
Equations (6.8) and (6.9) imply that
\(\square \)
Proof of Lemma 4
As \(n\rightarrow \infty \), under Assumption (A2),
Therefore under Assumption (A2), from (6.10) and (6.11) we see that as \(n\rightarrow \infty \),
Hence, \(\frac{\hat{\sigma }^2_E}{{\sigma }^2_E}\rightarrow 1\), almost surely, as \(n\rightarrow \infty \).
Let \(\hat{f}_n(y)\) be a strongly consistent estimator of f(y). Using (6.10) and (6.11), we see that under the stated conditions (A1) to (A3) as \(n\rightarrow \infty \)
From Theorem (3) we recall that as \(n\rightarrow \infty \)
Therefore under the stated assumptions \(\frac{\hat{\sigma }^2_K}{\sigma ^2_K}\rightarrow 1,\) almost surely, as \(n\rightarrow \infty \). \(\square \)
Proof of Lemma 5
The asmyptotic mean squared error AMSE(h) of \(\hat{F}_h(y)\) is sum of the asymptotic variance and square of the asymptotic bias. The expressions for the asymptotic bias and variance are obtained from the leading terms in Theorem 1 (ii) and \(\sigma ^2_K\) in Theorem 3.
Proof of Lemma 7
Since \(||\hat{F}_{C,\ \hat{h}}-F||\le ||\hat{F}_{C,\ \hat{h}}-\hat{F}_{\hat{h}}||+||\hat{F}_{\hat{h}}-F||\) and under the stated conditions \(||\hat{F}_{\hat{h}}-F||\rightarrow 0\), almost surely, as \(n\rightarrow \infty \) (see Lemma 6), therefore it is enough to prove \(||\hat{F}_{C,\ \hat{h}}-\hat{F}_{\hat{h}}||\rightarrow 0\), almost surely, as \(n\rightarrow \infty \).
Therefore
But, \(\hat{F}_{\hat{h}}(0)\rightarrow F(0)=0\) almost surely, as \(n\rightarrow \infty \) (see Lemma 6). Consequently, \(||\hat{F}_{C,\ \hat{h}}-\hat{F}_{\hat{h}}||\rightarrow 0\), almost surely, as \(n\rightarrow \infty \). This completes the proof of the first part.
Since \(\hat{F}_{C,\ \hat{h}}\) and F are both dfs, \(||\hat{F}_{\hat{h}}-F||\le 1\) almost surely. Therefore using the almost sure convergence of \(||\hat{F}_{C,\ \hat{h}}-F||\) and the dominated convergence theorem we see that \(E||\hat{F}_{C,\ \hat{h}}-F||^2\rightarrow 0\) as \(n\rightarrow \infty \). This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Bose, A., Dutta, S. Kernel based estimation of the distribution function for length biased data. Metrika 85, 269–287 (2022). https://doi.org/10.1007/s00184-021-00824-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-021-00824-3