Abstract
The problem of assessing a parametric regression model in the presence of spatial correlation is addressed in this work. For that purpose, a goodness-of-fit test based on a \(L_2\)-distance comparing a parametric and nonparametric regression estimators is proposed. Asymptotic properties of the test statistic, both under the null hypothesis and under local alternatives, are derived. Additionally, a bootstrap procedure is designed to calibrate the test in practice. Finite sample performance of the test is analyzed through a simulation study, and its applicability is illustrated using a real data example.
Similar content being viewed by others
References
Alcalá J, Cristóbal J, González-Manteiga W (1999) Goodness-of-fit test for linear models based on local polynomials. Stat Probab Lett 42:39–46
Azzalini A, Bowman AW, Härdle W (1989) On the use of nonparametric regression for model checking. Biometrika 76:1–11
Biedermann S, Dette H (2000) Testing linearity of regression models with dependent errors by kernel based methods. Test 9:417–438
Bowman AW, Azzalini A (1997) Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations, vol 18. OUP Oxford, Oxford
Bowman AW, Crujeiras RM (2013) Inference for variograms. Comput Stat Data Anal 66:19–31
Cressie N (1985) Fitting variogram models by weighted least squares. J Int Assoc Math Geol 17:563–586
Cressie NA (1993) Statistics for spatial data. Wiley, New York
Crujeiras RM, Van Keilegon I (2010) Least squares estimation of nonlinear spatial trends. Comput Stat Data Anal 54:452–465
Diblasi A, Bowman A (2001) On the use of the variogram in checking for independence in spatial data. Biometrics 57:211–218
Diggle P, Ribeiro PJ (2007) Model-based geostatistics. Springer, New York
Eubank RL, Spiegelman CH (1990) Testing the goodness of fit of a linear model via nonparametric regression techniques. J Am Stat Assoc 85:387–392
Eubank RL, Li CS, Wang S (2005) Testing lack-of-fit of parametric regression models using nonparametric regression techniques. Stat Sin 15:135–152
Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman and Hall, London
Fernández-Casal R (2016) npsp: nonparametric spatial (geo)statistics, R package version 0.5-3. http://cran.r-project.org/package=npsp. Accessed 1 Sept 2019
Fernández-Casal R, Castillo-Páez S, García-Soidán P (2017) Nonparametric estimation of the small-scale variability of heteroscedastic spatial processes. Spat Stat 22:358–370
Francisco-Fernandez M, Opsomer JD (2005) Smoothing parameter selection methods for nonparametric regression with spatially correlated errors. Can J Stat Rev Can Stat 33:279–295
Francisco-Fernández M, Jurado-Expósito M, Opsomer J, López-Granados F (2006) A nonparametric analysis of the spatial distribution of Convolvulus arvensis in wheat-sunflower rotations. Environmetrics 17:849–860
Francisco-Fernández M, Quintela-del Río A, Fernández-Casal R (2012) Nonparametric methods for spatial regression. An application to seismic events. Environmetrics 23(1):85–93
González-Manteiga W, Crujeiras RM (2013) An updated review of Goodness-of-Fit tests for regression models. Test 22:361–411
González-Manteiga W, Vilar-Fernández J (1995) Testing linear regression models using non-parametric regression estimators when errors are non-independent. Comput Stat Data Anal 20:521–541
Hallin M, Lu Z, Tran LT (2004) Local linear spatial regression. Ann Stat 32:2469–2500
Härdle W, Mammen E (1993) Comparing nonparametric versus parametric regression fits. Ann Stat 21:1926–1947
Harper WV, Furr JM (1986) Geostatistical analysis of potentiometric data in Wolfcamp aquifer of the Palo Duro Basin, Texas. Technical report, Battelle Memorial Inst
Kim TY, Ha J, Hwang SY, Park C, Luo ZM (2013) Central limit theorems for reduced U-statistics under dependence and their usefulness. Aust N Z J Stat 55:387–399
Li CS (2005) Using local linear kernel smoothers to test the lack of fit of nonlinear regression models. Stat Methodol 2:267–284
Liu XH (2001) Kernel smoothing for spatially correlated data. Ph.D. thesis, Department of Statistics, Iowa State University
Maglione D, Diblasi A (2004) Exploring a valid model for the variogram of an isotropic spatial process. Stoch Environ Res Risk Assess 18:366–376
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142
Opsomer J, Francisco-Fernández M (2010) Finding local departures from a parametric model using nonparametric regression. Stat Pap 51:69–84
Park C, Kim TY, Ha J, Luo ZM, Hwang SY (2015) Using a bimodal kernel for a nonparametric regression specification test. Stat Sin 25:1145–1161
R Development Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org. Accessed 1 Sept 2019
Ribeiro PJ, Diggle PJ (2016) geoR: analysis of geostatistical data, R package version 1.7-5.2. https://cran.r-project.org/package=geoR. Accessed 1 Sept 2019
Rozanov YA (1967) Stationary random processes. Holden Day, Oakland
Ruppert D, Wand MP (1994) Multivariate locally weighted least squares regression. Ann Stat 22:1346–1370
Vilar-Fernández J, González-Manteiga W (1996) Bootstrap test of goodness of fit to a linear model when errors are correlated. Commun Stat Theory Methods 25:2925–2953
Watson GS (1964) Smooth regression analysis. Sankhya 26:359–372
Weihrather G (1993) Testing a linear regression model against nonparametric alternatives. Metrika 40:367–379
Acknowledgements
The authors acknowledge the support from the Xunta de Galicia Grant ED481A-2017/361 and the European Union (European Social Fund—ESF). This research has been partially supported by MINECO Grants MTM2014-52876-R, MTM2016-76969-P and MTM2017-82724-R, and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and ED431C-2017-38, and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF. We also thank two reviewers and the Associate Editor for their helpful comments and suggestions that significantly improved this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Proof of Theorem 1
Appendix: Proof of Theorem 1
The test statistic (6) can be written as
Taking into account that, for every \(\eta >0\), \(\hat{f}_{\mathbf {H}}(\mathbf {x})=\frac{1}{n}\sum _{i=1}^{n} K_{\mathbf {H}}(\mathbf {X}_i-\mathbf {x})=f(\mathbf {x})+ O_p(n^{-2/(4+d)+\eta })\) uniformly in \(\mathbf {x}\) (see Härdle and Mammen 1993), and according to Liu (2001), it follows that
where \(\nabla f(\mathbf {x})\) denotes the \(d \times 1\) vector of first-order partial derivatives of f, and
denoting by \(T_{n12}\) the integral of the cross product. Regarding \(T_{n1}\), taking into account that the regression models considered are of the form \(m(\mathbf {x}) =m_{{\varvec{\beta }}_0}(\mathbf {x}) + n^{-1/2}|\mathbf {H}|^{-1/4}g(\mathbf {x}) \), one gets that
where
Under assumptions (A1)–(A3) and (A7), and given that the difference \(m_{\hat{{\varvec{\beta }}}}(\mathbf {x})-m_{{{\varvec{\beta }}}_0}(\mathbf {x})=O_p(n^{-1/2})\) uniformly in \(\mathbf {x}\), it is obtained that
As for the term \(I_2(\mathbf {x})\), taking into account Lemma 1 (available in the Online Supplementary Material), by straightforward calculations it follows that
Note that the leading term of (13) is the term \(b_{1\mathbf {H}}\) in Theorem 1. Finally, \(I_3(\mathbf {x})\) (associated with the error component) can be decomposed as
For the first term, one gets that
Similarly, it is obtained that \(\text{ Var }(I_{31})=O_p(n^{-1}|\mathbf {H}|^{-1})\), and, therefore,
The leading term of (14) corresponds to the first term of \(b_{0\mathbf {H}}\) in Theorem 1. For the term \(I_{32}\), let
thus,
and this can be seen as a U-statistic with degenerate kernel. To obtain the asymptotic normality of \(I_{32}\), considering assumption (A6), Theorem 2 given in Kim et al. (2013) will be applied. For this term, under assumptions (A4), (A7), (A8) and (A9), and according to Liu (2001), one gets that
corresponding to the second term of \(b_{0\mathbf {H}}\) in Theorem 1.
Similarly, it can be shown that the leading term of the variance of \(I_{32}\) is given by:
Therefore, using the central limit theorem for degenerate reduced U-statistics under \(\alpha \)-mixing conditions, given in Kim et al. (2013), it is obtained that the term \(I_{32}\) converges, in distribution, to a normal distribution with mean the leading term of (15) and variance (16).
On the other hand, in virtue of the Cauchy–Schwarz inequality, the cross terms in \(T_{n1}\) resulting from the products of \(I_1(\mathbf {x})\), \(I_2(\mathbf {x})\) and \(I_3(\mathbf {x})\) are all of smaller order. Therefore, combining the results given in (12)–(14), and the asymptotic normality of \(I_{32}\) [with bias the leading term of (15) and variance (16)], one gets
where
and
The term \(T_{n2}\) in \(T_n\) is of smaller order than \(T_{n1}\) (specifically, \(T_{n2}=O_p(\text{ tr }(\mathbf {H}^2)T_{n1})\)), and by the Cauchy–Schwarz inequality, the cross term \(T_{n12}\) is of smaller order as well. Therefore, from (11), it follows that
Taking into account (17), it follows that
with \(b_{0\mathbf {H}}\), \(b_{1\mathbf {H}}\) and V given above.
Rights and permissions
About this article
Cite this article
Meilán-Vila, A., Opsomer, J.D., Francisco-Fernández, M. et al. A goodness-of-fit test for regression models with spatially correlated errors. TEST 29, 728–749 (2020). https://doi.org/10.1007/s11749-019-00678-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-019-00678-y