Skip to main content
Log in

One-sided cross-validation for nonsmooth density functions

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

One-sided cross-validation (OSCV) is a bandwidth selection method initially introduced by Hart and Yi (J Am Stat Assoc 93(442):620–631, 1998) in the context of smooth regression functions. Martínez-Miranda et al. (in Gregoriou (ed) Operational risk towards basel III: best practices and issues in modeling, management and regulation, Wiley, Hoboken, 2009) developed a version of OSCV for smooth density functions. This article extends the method for nonsmooth densities. It also introduces the fully robust OSCV modification that produces consistent OSCV bandwidths for both smooth and nonsmooth cases. Practical implementations of the OSCV method for smooth and nonsmooth densities are discussed. One of the considered cross-validation kernels has potential for improving the OSCV method’s performance in the regression context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bowman AW (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2):353–360

    Article  MathSciNet  Google Scholar 

  • Chiu S-T (1991) The effect of discretization error on bandwidth selection for kernel density estimation. Biometrika 78(2):436–441

    Article  MathSciNet  Google Scholar 

  • Cline DBH, Hart JD (1991) Kernel estimation of densities with discontinuities or discontinuous derivatives. Statistics 22(1):69–84

    Article  MathSciNet  Google Scholar 

  • Gámiz Pérez ML, Janys L, Martínez Miranda MD, Nielsen JP (2013a) Bandwidth selection in marker dependent kernel hazard estimation. Comput Stat Data Anal 68:155–169

    Article  MathSciNet  Google Scholar 

  • Gámiz Pérez ML, Martínez Miranda MD, Nielsen JP (2013b) Smoothing survival densities in practice. Comput Stat Data Anal 58:368–382

    Article  MathSciNet  Google Scholar 

  • Gámiz ML, Mammen E, Martínez Miranda MD, Nielsen JP (2016) Double one-sided cross-validation of local linear hazards. J R Stat Soc Ser B Stat Methodol 78(4):755–779

    Article  MathSciNet  Google Scholar 

  • Härdle W (1991) Smoothing techniques. Springer series in statistics. Springer, New York (with implementation in S)

    Book  Google Scholar 

  • Hart JD, Yi S (1998) One-sided cross-validation. J Am Stat Assoc 93(442):620–631

    Article  MathSciNet  Google Scholar 

  • Jones MC, Marron JS, Sheather SJ (1996) A brief survey of bandwidth selection for density estimation. J Am Stat Assoc 91(433):401–407

    Article  MathSciNet  Google Scholar 

  • Köhler M, Schindler A, Sperlich S (2014) A review and comparison of bandwidth selection methods for kernel regression. Int Stat Rev 82(2):243–274. https://doi.org/10.1111/insr.12039

    Article  MathSciNet  MATH  Google Scholar 

  • Loader CR (1999) Bandwidth selection: classical or plug-in? Ann Stat 27(2):415–438

    Article  MathSciNet  Google Scholar 

  • Mammen E, Martínez Miranda MD, Nielsen JP, Sperlich S (2011) Do-validation for kernel density estimation. J Am Stat Assoc 106(494):651–660. https://doi.org/10.1198/jasa.2011.tm08687

    Article  MathSciNet  MATH  Google Scholar 

  • Mammen E, Martínez Miranda MD, Nielsen JP, Sperlich S (2014) Further theoretical and practical insight to the do-validated bandwidth selector. J Korean Stat Soc 43(3):355–365

    Article  MathSciNet  Google Scholar 

  • Martínez-Miranda MD, Nielsen JP, Sperlich S (2009) One sided cross validation for density estimation. In: Gregoriou GN (ed) Operational risk towards basel III: best practices and issues in modeling, management and regulation. Wiley, Hoboken, pp 177–196

    Google Scholar 

  • Rudemo M (1982) Empirical choice of histograms and kernel density estimators. Scand J Stat 9(2):65–78

    MathSciNet  MATH  Google Scholar 

  • Savchuk O (2017a) ICV: Indirect Cross-Validation (ICV) for Kernel Density Estimation. R package version 1.0

  • Savchuk O (2017b) OSCV: One-Sided Cross-Validation. R package version 1.0

  • Savchuk OY, Hart JD (2017) Fully robust one-sided cross-validation for regression functions. Comput Stat. https://doi.org/10.1007/s00180-017-0713-7

    Article  MathSciNet  MATH  Google Scholar 

  • Savchuk OY, Hart JD, Sheather SJ (2010) Indirect cross-validation for density estimation. J Am Stat Assoc 105(489):415–423

    Article  MathSciNet  Google Scholar 

  • Savchuk O, Hart J, Sheather S (2011) An empirical study of indirect cross-validation. In: Hunter D, Rosenberge J, Richards D (eds) Nonparametric statistics and mixture models. World Scientific Publishing, Hackensack, NJ, pp 288–308. https://doi.org/10.1142/9789814340564_0017

  • Savchuk OY, Hart JD, Sheather SJ (2013) One-sided cross-validation for nonsmooth regression functions. J Nonparametr Stat 25(4):889–904

    Article  MathSciNet  Google Scholar 

  • Savchuk OY, Hart JD, Sheather SJ (2016) Corrigendum to “One-sided cross-validation for nonsmooth regression functions”. [J. Nonparametr. Stat., 25(4): 889–904, 2013]. J Nonparametr Stat 28(4):875–877

    Article  MathSciNet  Google Scholar 

  • Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B 53(3):683–690

    MathSciNet  MATH  Google Scholar 

  • Silverman BW (1986) Density estimation for statistics and data analysis. Monographs on statistics and applied probability. Chapman & Hall, London

    Google Scholar 

  • Tenreiro C (2017) A weighted least-squares cross-validation bandwidth selector for kernel density estimation. Commun Stat Theory Methods 46(7):3438–3458. https://doi.org/10.1080/03610926.2015.1062108

    Article  MathSciNet  MATH  Google Scholar 

  • van Eeden C (1985) Mean integrated squared error of kernel estimators when the density and its derivative are not necessarily continuous. Ann Inst Stat Math 37(3):461–472

    Article  MathSciNet  Google Scholar 

  • van Es B (1992) Asymptotics for least squares cross-validation bandwidths in nonsmooth cases. Ann Stat 20(3):1647–1657

    Article  MathSciNet  Google Scholar 

  • Wand MP, Jones MC (1995) Kernel smoothing. Volume 60 of monographs on statistics and applied probability. Chapman and Hall Ltd., London

    Google Scholar 

  • Yi S (1996) On one-sided cross-validation in nonparametric regression. Ph.D. dissertation, Texas A&M University

Download references

Acknowledgements

The author appreciate the Associate Editor and referees’ comments, especially the idea of extending the OSCV method to a nonsmooth case where a density has finitely many simple discontinuities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olga Y. Savchuk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Notation

For an arbitrary function g, define the following functionals:

$$\begin{aligned} \begin{array}{l} \displaystyle {D_g(z)=\int _{-\infty }^z g(u)\,du},\\ \displaystyle {G_g(z)=\int _{-\infty }^z ug(u)\,du,\qquad z\in {\mathbb {R}}}. \end{array} \end{aligned}$$

Based on \(D_g\) and \(G_g\) we define

$$\begin{aligned} B(g)=\int _0^\infty \left\{ z\bigl (1-D_g(z)\bigr ) +G_g(z)\right\} ^2\,dz+\int _0^\infty \left\{ zD_g(-z)+G_g(-z)\right\} ^2\,dz.\nonumber \\ \end{aligned}$$
(12)

The theoretical results in this article involve B(K) and B(L) for the two-sided and one-sided kernels K and L, respectively. In the case when K is symmetrical, one may derive from (12) that

$$\begin{aligned} B(K)=2\int _0^\infty \left\{ z\bigl (1-D_K(z)\bigr )+G_K(z)\right\} ^2\,dz. \end{aligned}$$

By taking into account that L is supported on \([0,\infty )\), it appears that

$$\begin{aligned} B(L)=\int _0^\infty \left\{ z\bigl (1-D_L(z)\bigr )+G_L(z)\right\} ^2\,dz. \end{aligned}$$

1.2 Identity of the left-sided and right-sided OSCV functions in the case of a symmetrical generating kernel H

As we noted above, \(K_R(u)=K_L(-u)\) in the case when H is symmetrical.

Theorem

In the case when H is symmetrical, \(\text{ OSCV }_{K_L}(b)=\text{ OSCV }_{K_R}(b)\) for any \(b>0\). This implies \({{\hat{h}}}_{OSCV,K_L}={{\hat{h}}}_{OSCV,K_R}\).

Proof

For any \(b>0\), consider

$$\begin{aligned} R({{\hat{f}}}_{b,K_L})=\frac{1}{n^2b^2}\sum _{i=1}^n\sum _{j=1}^n\int _{-\infty }^\infty K_L\left( \frac{x-X_i}{b}\right) K_L\left( \frac{x-X_j}{b}\right) \,dx. \end{aligned}$$

After the change of variables \(u=\displaystyle {\frac{x-\frac{X_i+X_j}{2}}{b}}\), we get

$$\begin{aligned} R({{\hat{f}}}_{b,K_L})= & {} \frac{1}{n^2b}\sum _{i=1}^n\sum _{j=1}^n\int _{-\infty }^\infty K_L\left( u-\frac{X_i-X_j}{2b}\right) K_L\left( u+\frac{X_i-X_j}{2b}\right) \,du\\= & {} \frac{1}{n^2b}\sum _{i=1}^n\sum _{j=1}^n\int _{-\infty }^\infty K_L\left( -z-\frac{X_i-X_j}{2b}\right) K_L\left( -z+\frac{X_i-X_j}{2b}\right) \,dz\\= & {} \frac{1}{n^2b}\sum _{i=1}^n\sum _{j=1}^n\int _{-\infty }^\infty K_R\left( z+\frac{X_i-X_j}{2b}\right) K_L \left( z-\frac{X_i-X_j}{2b}\right) \,dz\\= & {} R({{\hat{f}}}_{b,K_R}). \end{aligned}$$

Next, consider

$$\begin{aligned} \sum _{i=1}^n{{\hat{f}}}_{b,K_L}^{-i}(X_i)= & {} \frac{1}{(n-1)b}\sum _{i=1}^n\sum _{j=1}^n K_L\left( \frac{X_i-X_j}{b}\right) -\frac{n}{(n-1)b}K_L(0)\\= & {} \frac{1}{2(n-1)b}\sum _{i=1}^n\sum _{j=1}^n K_L\left( -\frac{|X_i-X_j|}{b}\right) -\frac{n}{2(n-1)b}K_L(0)\\= & {} \frac{1}{2(n-1)b}\sum _{i=1}^n\sum _{j=1}^n K_R\left( \frac{|X_i-X_j|}{b}\right) -\frac{n}{2(n-1)b} K_R(0)\\= & {} \sum _{i=1}^n{{\hat{f}}}_{b,K_R}^{-i}(X_i). \end{aligned}$$

This finishes the proof that \(\text{ OSCV }_{K_L}(b)=\text{ OSCV }_{K_R}(b)\) for any \(b>0\). This implies equality \({{\hat{h}}}_{OSCV,K_L}={{\hat{h}}}_{OSCV,K_R}\). \(\square \)

The above proof can be easily adjusted for the case of a slightly differently defined OSCV functions used in Mammen et al. (2011, 2014).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchuk, O.Y. One-sided cross-validation for nonsmooth density functions. Comput Stat 35, 1253–1272 (2020). https://doi.org/10.1007/s00180-019-00938-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00938-3

Keywords

Navigation