Kernel based estimation of the distribution function for length biased data

Bose, Arup; Dutta, Santanu

doi:10.1007/s00184-021-00824-3

Kernel based estimation of the distribution function for length biased data

Published: 13 January 2022

Volume 85, pages 269–287, (2022)
Cite this article

Metrika Aims and scope Submit manuscript

Arup Bose¹ &
Santanu Dutta²

365 Accesses
Explore all metrics

Abstract

Empirical and kernel estimators are considered for the distribution of positive length biased data. Their asymptotic bias, variance and limiting distribution are obtained. For the kernel estimator, the asymptotically optimal bandwidth is calculated and rule of thumb bandwidths are proposed. At any point below the median, the asymptotic mean squared error of the kernel estimator is smaller than that of the empirical estimator. A suitably truncated kernel estimator is positive and we prove the strong uniform, and $L_2$ consistency of this estimator. Simulations reveal the improved performance of the truncated kernel estimator in estimating tail probabilities based on length biased data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel Estimation of Entropy Function for Length-Biased Censored Data

Article 17 July 2021

Rajesh G., Richu Rajesh & S. M. Sunoj

Extreme quantiles and tail index of a distribution based on kernel estimator

Article 09 October 2018

Zuhair Bahraoui & M. Amin Bahraoui

Asymptotic Normality of Binned Kernel Density Estimators for Non-stationary Dependent Random Variables

Notes

A non-negative valued r.v. is said to follow log-normal$(\mu ,\ \sigma ^2)$ distribution iff the natural logarithm of the random variable follows normal distribution with mean $\mu $ and variance $\sigma ^2$.

References

Altman N, Léger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46:195–214
Article MathSciNet Google Scholar
Azzalini A (1981) Estimation of a distribution function and quantiles by a kernel method. Biometrika 68(1):326–328
Article MathSciNet Google Scholar
Ajami M, Fakoor V, Jomhoori S (2013) Some asymptotic results of kernel density estimator in length-biased sampling. J Sci Islam 24(1):55–62
MathSciNet Google Scholar
Borrajo MI, González-Manteiga W, Martinez-Miranda MD (2017) Bandwidth Selection for kernel density estimation with length biased data. J Nonparametr Stat 29(3):636–638. https://doi.org/10.1080/10485252.2017.1339309
Article MathSciNet MATH Google Scholar
Blumenthal S (1967) Proportional sampling in life length studies. Technometrics 9:205–218
Article MathSciNet Google Scholar
Chow YS, Teicher H (1997) Probability theory, independence, interchangeability, martingales, 3rd edn. Springer, Berlin
MATH Google Scholar
Cox DR (1969) Some sampling problems in technology. In: Johnson NL, Smith H (eds) New developments in survey sampling. Wiley, New York, pp 506–27
Google Scholar
Del Río AQ, Estévez-Pŕez G (2012) Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications. J Stat Softw 50(8):1–21
Google Scholar
Dutta S (2015) Local smoothing for kernel distribution function estimation. Commun Stat Simul Comput 44:878–891
Article MathSciNet Google Scholar
Efromovich S (2018) Missing and modified data in nonparametric estimation, with R examples. Chapman & Hall, London
Book Google Scholar
Falk FY (1983) Relative efficiency and deficiency of kernel type estimators of distribution functions. Stat Neerl 37(2):73–83
Article MathSciNet Google Scholar
Gill RD, Vardi Y, Wellner JA (1988) Large sample theory of empirical distributions in biased sampling models. Ann Statist 16(3):1069–1112
Article MathSciNet Google Scholar
Horváh L (1985) Estimation from a length-biased distribution. Stat Decis 3:91–113
MathSciNet Google Scholar
Huang CY, Qin J (2011) Nonparametric estimation for length-biased and right-censored data. Biometrika 98(1):177–186
Article MathSciNet Google Scholar
Jones MC (1991) Kernel density estimation for length biased data. Biometrika 78(3):511–519
Article MathSciNet Google Scholar
Liu R, Lijian Y (2008) Kernel estimation of multivariate cumulative distribution function. J Nonparametr Stat 20(8):661–677. https://doi.org/10.1080/10485250802326391
Article MathSciNet MATH Google Scholar
McFadden JA (1962) On the lengths of intervals in a stationary point process. J R Soc Ser B 24:364–382
MathSciNet MATH Google Scholar
Patil GP, Rao CR (1977) The weighted distributions: a survey of their applications. In: Krishnaiah PR (ed) Applications of statistics. North-Holland, Amsterdam, pp 383–405
Google Scholar
Patil GP, Rao CR (1978) Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 34:179–189
Article MathSciNet Google Scholar
Qin J (2017) Biased sampling. Over-identified parameter problems and beyond, Springer, Singapore
Reiss RD (1981) Nonparametric estimation of smooth distribution functions. Scand J Stat 8(2):116–119
MathSciNet MATH Google Scholar
Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York
Book Google Scholar
Swanepoel JWH, Graan FCV (2005) A new kernel distribution function estimator based on a non-parametric transformation of the data. Scand Stat Theory Appl 32(4):551–562
Article MathSciNet Google Scholar
Vardi Y (1982a) Nonparametric estimation in the presence of length bias. Ann Stat 10(2):616–620
Article MathSciNet Google Scholar
Vardi V (1982b) Nonparametric estimation in renewal process. Ann Stat 10(3):772–785

Download references

Acknowledgements

We are deeply thankful to the referee for pointing out several mistakes and for offering constructive suggestions for improvement.

Author information

Authors and Affiliations

Stat-Math Unit, Indian Statistical Unit, Kolkata, India
Arup Bose
Department of Mathematical Sciences, Tezpur University, Tezpur, Assam, India
Santanu Dutta

Authors

Arup Bose
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Dutta
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A. Bose: Research supported by J.C. Bose National Fellowship, Govt. of India. S. Dutta: Research supported by the MATRICS scheme NO. MTR/2019/000502 of Science and Engineering Research Board (SERB), Govt. of India.

Appendix

Proof

We can write

$$\begin{aligned} \frac{N}{D}=\frac{N}{E(D)(1-r)},\quad \text {where}\ 1-r=\frac{D}{E(D)}. \end{aligned}$$

Under the stated assumptions $P(D=E(D))=0$, and therefore $P(r\ne 1)=1$. We know that for $r\ne 1$,

$$\begin{aligned} \frac{1}{1-r}=1+r+\frac{r^2}{(1-r)}.\end{aligned}$$

(6.1)

It is easy to verify that

$$\begin{aligned} E\left[ \left( 1+r+\frac{r^2}{1-r}\right) \frac{N}{E(D)}\right]= & {} \frac{E(N)}{E(D)}-\frac{1}{E^2(D)}Cov(N,D)\nonumber \\&+\frac{1}{E^2(D)}E\left[ \frac{N}{D}(D-E(D))^2\right] .\end{aligned}$$

(6.2)

Multiplying both sides of (6.1) by $\frac{N}{E(D)}$, taking expectation and using (6.2) we get the stated inequality. $\square $

Proof of Lemma 2

To prove Lemma 2 it is enough to see that

$$\begin{aligned}&E\left[ \frac{1}{Y^2_1}I(Y_1\le y)\right] =\frac{1}{\mu }\int ^y_0 \frac{f(z)}{z}dz,\ \ E\left[ \frac{1}{Y_1}I(Y_1\le y)\right] \nonumber \\&\quad =\frac{1}{\mu }\int ^y_0f(z)dz=\frac{F(y)}{\mu }. \end{aligned}$$

(6.3)

$$\begin{aligned} \text {and}&\ \ E\left[ \frac{1}{Y^2_1}I(Y_1\le y)\right] =\frac{1}{\mu }\int ^y_0 \frac{f(z)}{z}dz.\end{aligned}$$

(6.4)

Using (6.3) and (6.4), it is straight forward to verify the expressions of $Var\left[ \frac{1}{Y_1}I(Y_1\le y)\right] ,\ Cov\left[ \frac{1}{Y_1}I(Y_1\le y),\frac{1}{Y_1}\right] $ in Lemma 2. $\square $

Proof of Lemma 3

$$\begin{aligned}&E\left[ \frac{1}{Y^2_1}K^2\left( \frac{y-Y_1}{h}\right) \right] \nonumber \\&\quad = \frac{1}{\mu }\int ^\infty _0\nonumber K^2\left( \frac{y-z}{h}\right) \frac{f(z)}{z}dz=\frac{h}{\mu }\int ^{\frac{y}{h}}_{-\infty }K^2(v)\frac{f(y-vh)}{y-vh}dv\nonumber \\&\quad =\frac{h}{y\mu }\int ^{\frac{y}{h}}_{-\infty }K^2(v)f\left( y-vh\right) dv+ \frac{h^2}{\mu }\int ^{y/h}_{-\infty }vK^2(v)\frac{f(y-vh)}{y(y-vh)}dv\nonumber \\&\quad = \frac{h}{y\mu }\int ^{\frac{y}{h}}_{-\infty }K^2(v)f\left( y-vh\right) dv+O\left( h^2\right) \ (\text {Assumptions (A1)-(A3)})\nonumber \\&\quad =-\frac{1}{y\mu }\int ^{\frac{y}{h}}_{-\infty }K^2(v)dF\left( y-vh\right) +O\left( h^2\right) .\end{aligned}$$

(6.5)

Using integration by parts we get

$$\begin{aligned}&\int ^{\frac{y}{h}}_{-\infty }K^2(v)dF\left( y-vh\right) +\int ^{\frac{y}{h}}_{-\infty }F(y-hv)dK^2(v)\nonumber \\&\quad = F(0)K^2(y/h)-F(\infty )K^2(-\infty ).\nonumber \\&\quad \Rightarrow -\int ^{\frac{y}{h}}_{-\infty }K^2(v)dF\left( y-vh\right) \\&\quad = \int ^{\frac{y}{h}}_{-\infty }F(y-hv)dK^2(v)\ (\text {as}\ K(-\infty )=0 \ \text {and}\ F(0)=0) \end{aligned}$$

Therefore, as $n\rightarrow \infty $

$$\begin{aligned} -\int ^{\frac{y}{h}}_{-\infty }K^2(v)dF\left( y-vh\right)= & {} F(y)K^2\left( \frac{y}{h}\right) -hf(y)\int ^{\frac{y}{h}}_{-\infty }vdK^2(v) + O(h^2),\nonumber \\= & {} F(y)-hf(y)\int ^1_{-1}vdK^2(v)+ O(h^2) (\text {as}\ K(1)=1).\nonumber \\ \end{aligned}$$

(6.6)

From (6.5) and (6.6) we get

$$\begin{aligned} E\left[ \frac{1}{Y^2_1}K^2\left( \frac{y-Y_1}{h}\right) \right] =\frac{1}{y\mu }[F(y)-hf(y)\int ^1_{-1}vdK^2(v)]+O\left( h^2\right) \ \text {as}\ n\rightarrow \infty .\end{aligned}$$

(6.7)

Further under Assumptions (A1) to (A3) we get

$$\begin{aligned} E\left[ \frac{1}{Y_1}K\left( \frac{y-Y_1}{h}\right) \right] =-\frac{1}{\mu }\int ^{\frac{y}{h}}_{-\infty }K(v)dF(y-vh)=\frac{F(y)}{\mu }+O\left( h^2\right) \text {as}\ n\rightarrow \infty .\end{aligned}$$

(6.8)

Equations (6.7) and (6.8) imply that

$$\begin{aligned} Var(N)= & {} \frac{1}{n}Var\left[ \frac{1}{Y_1}K\left( \frac{y-Y_1}{h}\right) \right] \\= & {} \frac{1}{n\mu }\left[ \frac{F(y)}{y}-\frac{hf(y)}{y}\int ^1_{-1}vdK^2(v)-\frac{F^2(y)}{\mu }\right] \\&+O\left( \frac{h^2}{n}\right) \text {as}\ n\rightarrow \infty .\end{aligned}$$

$\square $

Using similar arguments we get

$$\begin{aligned} E\left[ \frac{1}{Y^2_1}K\left( \frac{y-Y_1}{h}\right) \right]= & {} \frac{1}{y\mu }[F(y)-hf(y)\int ^1_{-1}vdK(v)]+O\left( h^2\right) \nonumber \\= & {} \frac{F(y)}{y\mu }+O\left( h^2\right) \quad \text {as}\ n\rightarrow \infty .\end{aligned}$$

(6.9)

Equations (6.8) and (6.9) imply that

$$\begin{aligned} Cov(N,D)= & {} \frac{1}{n}Cov\left( \frac{1}{Y_1}K\left( \frac{y-Y_1}{h}\right) ,\frac{1}{Y_1}\right) \\= & {} \frac{F(y)}{n\mu }\left[ \frac{1}{y}-\frac{1}{\mu }\right] +O\left( \frac{h^2}{n}\right) \ \text {as}\ n\rightarrow \infty . \end{aligned}$$

$\square $

Proof of Lemma 4

$$\begin{aligned} \hat{\sigma }^2_E= & {} \frac{1}{\frac{1}{n}\left( \sum ^n_{i=1}\frac{1}{Y_i}\right) ^2} \sum ^n_{i=1}\frac{1}{Y^2_i}\left[ I(Y_i\le y)-\hat{F}_n(y)\right] ^2\\= & {} \frac{1}{\left( \frac{1}{n}\sum ^n_{i=1}\frac{1}{Y_i}\right) ^2}\left[ \frac{1}{n}\sum ^n_{i=1}\frac{I(Y_i\le y)}{Y^2_i}+[\hat{F}_n(y)]^2\frac{1}{n}\sum ^n_{i=1}\frac{1}{Y^2_i} -2\hat{F}_n(y)\sum ^n_{i=1}\frac{I(Y_i\le y)}{Y^2_i}\right] \end{aligned}$$

As $n\rightarrow \infty $, under Assumption (A2),

$$\begin{aligned}&\frac{1}{n}\sum ^n_{i=1}\frac{1}{Y_i}\rightarrow E\left( \frac{1}{Y_1}\right) =\frac{1}{\mu },\ \frac{1}{n}\sum ^n_{i=1}\frac{I(Y_i\le y)}{Y^2_i}\rightarrow \frac{1}{\mu }\int ^y_0 \frac{f(z)}{z}dz, \ \text {almost surely} \end{aligned}$$

(6.10)

$$\begin{aligned}&\frac{1}{n}\sum ^n_{i=1}\frac{1}{Y^2_i}\rightarrow \frac{1}{\mu }\int ^\infty _0 \frac{f(z)}{z}dz\ \text {and}\ \hat{F}_n(y)\rightarrow F(y),\ \text {almost surely}. \end{aligned}$$

(6.11)

Therefore under Assumption (A2), from (6.10) and (6.11) we see that as $n\rightarrow \infty $,

$$\begin{aligned} \hat{\sigma }^2_E \rightarrow&\mu \left[ \int ^y_0 \frac{f(z)}{z}dz + [F(y)]^2\int ^\infty _0 \frac{f(z)}{z}dz-2F(y)\int ^y_0 \frac{f(z)}{z}dz\right] , \text {almost surely}\\&= \mu \int ^\infty _0 \frac{f(z)}{z}\left[ I(z\le y)-F(y)\right] ^2= {\sigma }^2_E.\end{aligned}$$

Hence, $\frac{\hat{\sigma }^2_E}{{\sigma }^2_E}\rightarrow 1$, almost surely, as $n\rightarrow \infty $.

Let $\hat{f}_n(y)$ be a strongly consistent estimator of f(y). Using (6.10) and (6.11), we see that under the stated conditions (A1) to (A3) as $n\rightarrow \infty $

$$\begin{aligned} \hat{\sigma }^2_K= & {} \hat{\mu }\left[ \frac{\hat{F}_n(y)}{y}+\hat{F}^2_n(y)\frac{\hat{\mu }}{n}\sum ^n_{i=1}\frac{1}{Y^2_i}-\frac{2\hat{F}^2_n(y)}{y} -\frac{h\hat{f}_n(y)}{y}\int ^1_{-1}vdK^2(v)\right] \\\rightarrow & {} \mu \left[ \frac{F(y)}{y}+F^2(y)\int ^\infty _0 \frac{f(z)}{z}dz-\frac{2F^2(y)}{y}-\frac{hf(y)}{y}\int ^1_{-1}vdK^2(v)\right] ,\ \text {almost surely}.\end{aligned}$$

From Theorem (3) we recall that as $n\rightarrow \infty $

$$\begin{aligned} \sigma ^2_K=\mu \left[ \frac{F(y)}{y}+F^2(y)\int ^\infty _0 \frac{f(z)}{z}dz-\frac{2F^2(y)}{y}-\frac{hf(y)}{y}\int ^1_{-1}vdK^2(v)\right] +O(h^2). \end{aligned}$$

Therefore under the stated assumptions $\frac{\hat{\sigma }^2_K}{\sigma ^2_K}\rightarrow 1,$ almost surely, as $n\rightarrow \infty $. $\square $

Proof of Lemma 5

The asmyptotic mean squared error AMSE(h) of $\hat{F}_h(y)$ is sum of the asymptotic variance and square of the asymptotic bias. The expressions for the asymptotic bias and variance are obtained from the leading terms in Theorem 1 (ii) and $\sigma ^2_K$ in Theorem 3.

Proof of Lemma 7

Since $||\hat{F}_{C,\ \hat{h}}-F||\le ||\hat{F}_{C,\ \hat{h}}-\hat{F}_{\hat{h}}||+||\hat{F}_{\hat{h}}-F||$ and under the stated conditions $||\hat{F}_{\hat{h}}-F||\rightarrow 0$, almost surely, as $n\rightarrow \infty $ (see Lemma 6), therefore it is enough to prove $||\hat{F}_{C,\ \hat{h}}-\hat{F}_{\hat{h}}||\rightarrow 0$, almost surely, as $n\rightarrow \infty $.

$$\begin{aligned} \hat{F}_{C,\ \hat{h}}(y)-\hat{F}_{\hat{h}}(y)={\left\{ \begin{array}{ll}-\hat{F}_{\hat{h}}(0)\frac{(1-\hat{F}_{\hat{h}}(y))}{1-\hat{F}_{\hat{h}}(0)},\ y>0\\ -\hat{F}_{\hat{h}}(y),\ y\le 0.\end{array}\right. }\end{aligned}$$

Therefore

$$\begin{aligned} ||\hat{F}_{C,\ \hat{h}}-\hat{F}_{\hat{h}}||\le \hat{F}_{\hat{h}}(0). \end{aligned}$$

But, $\hat{F}_{\hat{h}}(0)\rightarrow F(0)=0$ almost surely, as $n\rightarrow \infty $ (see Lemma 6). Consequently, $||\hat{F}_{C,\ \hat{h}}-\hat{F}_{\hat{h}}||\rightarrow 0$, almost surely, as $n\rightarrow \infty $. This completes the proof of the first part.

Since $\hat{F}_{C,\ \hat{h}}$ and F are both dfs, $||\hat{F}_{\hat{h}}-F||\le 1$ almost surely. Therefore using the almost sure convergence of $||\hat{F}_{C,\ \hat{h}}-F||$ and the dominated convergence theorem we see that $E||\hat{F}_{C,\ \hat{h}}-F||^2\rightarrow 0$ as $n\rightarrow \infty $. This completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bose, A., Dutta, S. Kernel based estimation of the distribution function for length biased data. Metrika 85, 269–287 (2022). https://doi.org/10.1007/s00184-021-00824-3

Download citation

Received: 31 December 2020
Accepted: 04 June 2021
Published: 13 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00184-021-00824-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel based estimation of the distribution function for length biased data

Abstract

Access this article

Similar content being viewed by others

Kernel Estimation of Entropy Function for Length-Biased Censored Data

Extreme quantiles and tail index of a distribution based on kernel estimator

Asymptotic Normality of Binned Kernel Density Estimators for Non-stationary Dependent Random Variables

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Proof

Proof of Lemma 2

Proof of Lemma 3

Proof of Lemma 4

Proof of Lemma 5

Proof of Lemma 7

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel based estimation of the distribution function for length biased data

Abstract

Access this article

Similar content being viewed by others

Kernel Estimation of Entropy Function for Length-Biased Censored Data

Extreme quantiles and tail index of a distribution based on kernel estimator

Asymptotic Normality of Binned Kernel Density Estimators for Non-stationary Dependent Random Variables

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Proof

Proof of Lemma 2

Proof of Lemma 3

Proof of Lemma 4

Proof of Lemma 5

Proof of Lemma 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation