Skip to main content
Log in

An optimal test for the additive model with discrete or categorical predictors

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In multivariate nonparametric regression, the additive models are very useful when a suitable parametric model is difficult to find. The backfitting algorithm is a powerful tool to estimate the additive components. However, due to complexity of the estimators, the asymptotic p value of the associated test is difficult to calculate without a Monte Carlo simulation. Moreover, the conventional tests assume that the predictor variables are strictly continuous. In this paper, a new test is introduced for the additive components with discrete or categorical predictors, where the model may contain continuous covariates. This method is also applied to the semiparametric regression to test the goodness of fit of the model. These tests are asymptotically optimal in terms of the rate of convergence, as they can detect a specific class of contiguous alternatives at a rate of \(n^{-1/2}\). An extensive simulation study and a real data example are presented to support the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Buja, A., Hastie, T., Tibshirani, R. (1989). Linear smoothers and additive models. The Annals of Statistics, 17(2), 453–555.

    Article  MathSciNet  Google Scholar 

  • Davies, R. B. (1980). The distribution of a linear combination of \(\chi ^2\) random variables. Algorithm AS155. Applied Statistics, 29, 323–333.

    Article  Google Scholar 

  • Fan, J., Jiang, J. (2005). Nonparametric inferences for additive models. Journal of the American Statistical Association, 100(471), 890–907.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Zhang, C., Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. The Annals of Statistics, 29(1), 153–193.

    Article  MathSciNet  Google Scholar 

  • Friedman, J. H., Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76(376), 817–823.

    Article  MathSciNet  Google Scholar 

  • Hall, P., Marron, J. S. (1988). Variable window width kernel estimates of probability densities. Probability Theory and Related Fields, 80(1), 37–49.

    Article  MathSciNet  Google Scholar 

  • Hastie, T., Tibshirani, R. (2000). Bayesian backfitting (with discussion). Statistical Science. A Review Journal of the Institute of Mathematical Statistics, 15(3), 196–223.

    MathSciNet  MATH  Google Scholar 

  • Hastie, T. J., Tibshirani, R. J. (1990). Generalized additive models, volume 43 of monographs on statistics and applied probability. London: Chapman and Hall Ltd.

    Google Scholar 

  • Ingster, Y. I. (1993a). Asymptotically minimax hypothesis testing for nonparametric alternatives. I. Mathematical Methods of Statistics, 2, 85–114.

    MathSciNet  MATH  Google Scholar 

  • Ingster, Y. I. (1993b). Asymptotically minimax hypothesis testing for nonparametric alternatives. II. Mathematical Methods of Statistics, 3, 171–189.

    MathSciNet  MATH  Google Scholar 

  • Ingster, Y. I. (1993c). Asymptotically minimax hypothesis testing for nonparametric alternatives. III. Mathematical Methods of Statistics, 4, 249–268.

    MathSciNet  MATH  Google Scholar 

  • Jiang, J., Zhou, H., Jiang, X., et al. (2007). Generalized likelihood ratio tests for the structure of semiparametric additive models. The Canadian Journal of Statistics, 35(3), 381–398.

    Article  MathSciNet  Google Scholar 

  • Mammen, E., Linton, O., Nielsen, J. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27(5), 1443–1490.

    MathSciNet  MATH  Google Scholar 

  • Opsomer, J. D. (2000). Asymptotic properties of backfitting estimators. Journal of Multivariate Analysis, 73(2), 166–179.

    Article  MathSciNet  Google Scholar 

  • Opsomer, J. D., Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. The Annals of Statistics, 25(1), 186–211.

    Article  MathSciNet  Google Scholar 

  • Opsomer, J. D., Ruppert, D. (1999). A root-n consistent backfitting estimator for semiparametric additive modeling. Journal of Computational and Graphical Statistics, 8(4), 715–732.

    Google Scholar 

  • Speckman, P. (1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society. Series B. Methodological, 50(3), 413–436.

    MathSciNet  MATH  Google Scholar 

  • Sperlich, S., Tjøstheim, D., Yang, L. (2002). Nonparametric estimation and testing of interaction in additive models. Econometric Theory, 18(2), 197–251.

    Article  MathSciNet  Google Scholar 

  • Spokoiny, V. G. (1996). Adaptive hypothesis testing using wavelets. The Annals of Statistics, 24(6), 2477–2498.

    Article  MathSciNet  Google Scholar 

  • Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13(2), 689–705.

    Article  MathSciNet  Google Scholar 

  • Stone, C. J. (1986). The dimensionality reduction principle for generalized additive models. The Annals of Statistics, 14(2), 590–606.

    Article  MathSciNet  Google Scholar 

  • Tjøstheim, D., Auestad, B. H. (1994). Nonparametric identification of nonlinear time series: Projections. Journal of the American Statistical Association, 89(428), 1398–1409.

    MathSciNet  MATH  Google Scholar 

  • Wand, M. P. (1999). A central limit theorem for local polynomial backfitting estimators. Journal of Multivariate Analysis, 70(1), 57–65.

    Article  MathSciNet  Google Scholar 

  • Watson, G. S. (1964). Smooth regression analysis. Sankhyā (Statistics). The Indian Journal of Statistics. Series A, 26, 359–372.

    MathSciNet  MATH  Google Scholar 

  • Wickham, H. (2009). ggplot2: Elegant graphics for data analysis. New York: Springer.

    Book  Google Scholar 

  • Yang, L., Sperlich, S., Härdle, W. (2003). Derivative estimation and testing in generalized additive models. Journal of Statistical Planning and Inference, 115(2), 521–542.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The author very much appreciates Kuchibhotla Arun Kumar for carefully reading the paper including all proofs and providing helpful comments and suggestions. The author would like to thank two anonymous referees who significantly improved the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhijit Mandal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 271 KB)

Appendix A: Regularity conditions

Appendix A: Regularity conditions

To derive the asymptotic distribution of the GLR test statistic, we need the following assumptions:

  1. (C1)

    Suppose \(c_{pj} = P(X_p=x_{pj})\), then \(c_{pj} \in (0,1)\) for all \(j=1,2, \ldots , k_p\) and \(p=1,2, \ldots , P\), where \(\sum _{j=1}^{k_p} c_{pj} =1\).

  2. (C2)

    The kernel function K(z) is bounded and Lipschitz continuous with a bounded support.

  3. (C3)

    If \(Z_q\) is continuous, then the density \(f_q\) of \(Z_q\) is Lipschitz continuous and bounded away from 0 and has bounded supports \(\Omega _q\) for \(q \in \{1,2, \ldots , Q\}\).

  4. (C4)

    If both \(Z_q\) and \(Z_{q'}\) are continuous, then the joint density \(f_{qq'}\) of \(Z_q\) and \(Z_{q'}\) is Lipschitz continuous on its support \(\Omega _q \times \Omega _{q'}\) for \(q \ne q' \in \{1,2, \ldots , Q\}\).

  5. (C5)

    \(nh_q / \log (n) \rightarrow \infty \) as \(n \rightarrow \infty \) and \(h_q \rightarrow 0\) for \(q = 1,2, \ldots , Q\).

  6. (C6)

    If \(Z_q\) is continuous and \(d_q\) is the degree of the polynomial used for smoothing of \(Z_q\), then the \(( d_q + 1)\)th derivative of \(m_{P+q}\), for \(q \in \{1,2, \ldots , Q\}\), exists and is bounded and continuous.

  7. (C7)

    \(\sigma ^2 = \text{ Var }[\epsilon ] = E[\epsilon ^2 ] < \infty \).

  8. (C8)

    \(E[ m_q (Z_q)| X_p = x_{pj} ] = 0\) for all \(j=1,2,\ldots , k_p\), \( p = 1,2, \ldots , P\) and \(q = 1,2, \ldots , Q\).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mandal, A. An optimal test for the additive model with discrete or categorical predictors. Ann Inst Stat Math 72, 1397–1417 (2020). https://doi.org/10.1007/s10463-019-00729-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-019-00729-z

Keywords

Navigation