Abstract
In multivariate nonparametric regression, the additive models are very useful when a suitable parametric model is difficult to find. The backfitting algorithm is a powerful tool to estimate the additive components. However, due to complexity of the estimators, the asymptotic p value of the associated test is difficult to calculate without a Monte Carlo simulation. Moreover, the conventional tests assume that the predictor variables are strictly continuous. In this paper, a new test is introduced for the additive components with discrete or categorical predictors, where the model may contain continuous covariates. This method is also applied to the semiparametric regression to test the goodness of fit of the model. These tests are asymptotically optimal in terms of the rate of convergence, as they can detect a specific class of contiguous alternatives at a rate of \(n^{-1/2}\). An extensive simulation study and a real data example are presented to support the theoretical results.
Similar content being viewed by others
References
Buja, A., Hastie, T., Tibshirani, R. (1989). Linear smoothers and additive models. The Annals of Statistics, 17(2), 453–555.
Davies, R. B. (1980). The distribution of a linear combination of \(\chi ^2\) random variables. Algorithm AS155. Applied Statistics, 29, 323–333.
Fan, J., Jiang, J. (2005). Nonparametric inferences for additive models. Journal of the American Statistical Association, 100(471), 890–907.
Fan, J., Zhang, C., Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. The Annals of Statistics, 29(1), 153–193.
Friedman, J. H., Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76(376), 817–823.
Hall, P., Marron, J. S. (1988). Variable window width kernel estimates of probability densities. Probability Theory and Related Fields, 80(1), 37–49.
Hastie, T., Tibshirani, R. (2000). Bayesian backfitting (with discussion). Statistical Science. A Review Journal of the Institute of Mathematical Statistics, 15(3), 196–223.
Hastie, T. J., Tibshirani, R. J. (1990). Generalized additive models, volume 43 of monographs on statistics and applied probability. London: Chapman and Hall Ltd.
Ingster, Y. I. (1993a). Asymptotically minimax hypothesis testing for nonparametric alternatives. I. Mathematical Methods of Statistics, 2, 85–114.
Ingster, Y. I. (1993b). Asymptotically minimax hypothesis testing for nonparametric alternatives. II. Mathematical Methods of Statistics, 3, 171–189.
Ingster, Y. I. (1993c). Asymptotically minimax hypothesis testing for nonparametric alternatives. III. Mathematical Methods of Statistics, 4, 249–268.
Jiang, J., Zhou, H., Jiang, X., et al. (2007). Generalized likelihood ratio tests for the structure of semiparametric additive models. The Canadian Journal of Statistics, 35(3), 381–398.
Mammen, E., Linton, O., Nielsen, J. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27(5), 1443–1490.
Opsomer, J. D. (2000). Asymptotic properties of backfitting estimators. Journal of Multivariate Analysis, 73(2), 166–179.
Opsomer, J. D., Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. The Annals of Statistics, 25(1), 186–211.
Opsomer, J. D., Ruppert, D. (1999). A root-n consistent backfitting estimator for semiparametric additive modeling. Journal of Computational and Graphical Statistics, 8(4), 715–732.
Speckman, P. (1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society. Series B. Methodological, 50(3), 413–436.
Sperlich, S., Tjøstheim, D., Yang, L. (2002). Nonparametric estimation and testing of interaction in additive models. Econometric Theory, 18(2), 197–251.
Spokoiny, V. G. (1996). Adaptive hypothesis testing using wavelets. The Annals of Statistics, 24(6), 2477–2498.
Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13(2), 689–705.
Stone, C. J. (1986). The dimensionality reduction principle for generalized additive models. The Annals of Statistics, 14(2), 590–606.
Tjøstheim, D., Auestad, B. H. (1994). Nonparametric identification of nonlinear time series: Projections. Journal of the American Statistical Association, 89(428), 1398–1409.
Wand, M. P. (1999). A central limit theorem for local polynomial backfitting estimators. Journal of Multivariate Analysis, 70(1), 57–65.
Watson, G. S. (1964). Smooth regression analysis. Sankhyā (Statistics). The Indian Journal of Statistics. Series A, 26, 359–372.
Wickham, H. (2009). ggplot2: Elegant graphics for data analysis. New York: Springer.
Yang, L., Sperlich, S., Härdle, W. (2003). Derivative estimation and testing in generalized additive models. Journal of Statistical Planning and Inference, 115(2), 521–542.
Acknowledgements
The author very much appreciates Kuchibhotla Arun Kumar for carefully reading the paper including all proofs and providing helpful comments and suggestions. The author would like to thank two anonymous referees who significantly improved the presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix A: Regularity conditions
Appendix A: Regularity conditions
To derive the asymptotic distribution of the GLR test statistic, we need the following assumptions:
-
(C1)
Suppose \(c_{pj} = P(X_p=x_{pj})\), then \(c_{pj} \in (0,1)\) for all \(j=1,2, \ldots , k_p\) and \(p=1,2, \ldots , P\), where \(\sum _{j=1}^{k_p} c_{pj} =1\).
-
(C2)
The kernel function K(z) is bounded and Lipschitz continuous with a bounded support.
-
(C3)
If \(Z_q\) is continuous, then the density \(f_q\) of \(Z_q\) is Lipschitz continuous and bounded away from 0 and has bounded supports \(\Omega _q\) for \(q \in \{1,2, \ldots , Q\}\).
-
(C4)
If both \(Z_q\) and \(Z_{q'}\) are continuous, then the joint density \(f_{qq'}\) of \(Z_q\) and \(Z_{q'}\) is Lipschitz continuous on its support \(\Omega _q \times \Omega _{q'}\) for \(q \ne q' \in \{1,2, \ldots , Q\}\).
-
(C5)
\(nh_q / \log (n) \rightarrow \infty \) as \(n \rightarrow \infty \) and \(h_q \rightarrow 0\) for \(q = 1,2, \ldots , Q\).
-
(C6)
If \(Z_q\) is continuous and \(d_q\) is the degree of the polynomial used for smoothing of \(Z_q\), then the \(( d_q + 1)\)th derivative of \(m_{P+q}\), for \(q \in \{1,2, \ldots , Q\}\), exists and is bounded and continuous.
-
(C7)
\(\sigma ^2 = \text{ Var }[\epsilon ] = E[\epsilon ^2 ] < \infty \).
-
(C8)
\(E[ m_q (Z_q)| X_p = x_{pj} ] = 0\) for all \(j=1,2,\ldots , k_p\), \( p = 1,2, \ldots , P\) and \(q = 1,2, \ldots , Q\).
About this article
Cite this article
Mandal, A. An optimal test for the additive model with discrete or categorical predictors. Ann Inst Stat Math 72, 1397–1417 (2020). https://doi.org/10.1007/s10463-019-00729-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-019-00729-z