当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Random Forests for Spatially Dependent Data
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2021-08-13 , DOI: 10.1080/01621459.2021.1950003
Arkajyoti Saha 1 , Sumanta Basu 2 , Abhirup Datta 1
Affiliation  

Abstract

Spatial linear mixed-models, consisting of a linear covariate effect and a Gaussian process (GP) distributed spatial random effect, are widely used for analyses of geospatial data. We consider the setting where the covariate effect is nonlinear. Random forests (RF) are popular for estimating nonlinear functions but applications of RF for spatial data have often ignored the spatial correlation. We show that this impacts the performance of RF adversely. We propose RF-GLS, a novel and well-principled extension of RF, for estimating nonlinear covariate effects in spatial mixed models where the spatial correlation is modeled using GP. RF-GLS extends RF in the same way generalized least squares (GLS) fundamentally extends ordinary least squares (OLS) to accommodate for dependence in linear models. RF becomes a special case of RF-GLS, and is substantially outperformed by RF-GLS for both estimation and prediction across extensive numerical experiments with spatially correlated data. RF-GLS can be used for functional estimation in other types of dependent data like time series. We prove consistency of RF-GLS for β-mixing dependent error processes that include the popular spatial Matérn GP. As a byproduct, we also establish, to our knowledge, the first consistency result for RF under dependence. We establish results of independent importance, including a general consistency result of GLS optimizers of data-driven function classes, and a uniform law of large number under β-mixing dependence with weaker assumptions. These new tools can be potentially useful for asymptotic analysis of other GLS-style estimators in nonparametric regression with dependent data.



中文翻译:

空间相关数据的随机森林

摘要

由线性协变量效应和高斯过程 (GP) 分布空间随机效应组成的空间线性混合模型广泛用于地理空间数据分析。我们考虑协变量效应是非线性的设置。随机森林 (RF) 在估计非线性函数方面很受欢迎,但 RF 在空间数据中的应用往往忽略了空间相关性。我们表明这会对 RF 的性能产生不利影响。我们提出 RF-GLS,一种新颖且原则良好的 RF 扩展,用于估计空间混合模型中的非线性协变量效应,其中空间相关性使用 GP 建模。RF-GLS 扩展 RF 的方式与广义最小二乘法 (GLS) 从根本上扩展普通最小二乘法 (OLS) 以适应线性模型的依赖性相同。RF成为RF-GLS的特例,并且在使用空间相关数据的大量数值实验中的估计和预测方面,RF-GLS 的表现都大大优于 RF-GLS。RF-GLS 可用于其他类型的相关数据(如时间序列)的功能估计。我们证明了 RF-GLS 的一致性β -混合相关错误过程,包括流行的空间 Matérn GP。作为副产品,据我们所知,我们还建立了依赖性下 RF 的第一个一致性结果。我们建立了独立重要性的结果,包括数据驱动函数类的 GLS 优化器的一般一致性结果,以及β混合依赖和较弱假设下的大数统一定律。这些新工具可能有助于在具有相关数据的非参数回归中对其他 GLS 样式估计量进行渐近分析。

更新日期:2021-08-13
down
wechat
bug