Scalable GWR: A Linear-Time Algorithm for Large-Scale Geographically Weighted Regression with Polynomial Kernels,Annals of the American Association of Geographers

当前位置： X-MOL 学术 › Ann. Am. Assoc. Geogr. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable GWR: A Linear-Time Algorithm for Large-Scale Geographically Weighted Regression with Polynomial Kernels
Annals of the American Association of Geographers ( IF 3.2 ) Pub Date : 2020-08-03 , DOI: 10.1080/24694452.2020.1774350
Daisuke Murakami ₁ , Narumasa Tsutsumida ₂ , Takahiro Yoshida ₃ , Tomoki Nakaya ₂ , Binbin Lu ₄

Affiliation

Although a number of studies have developed fast geographically weighted regression (GWR) algorithms for large samples, none of them has achieved linear-time estimation, which is considered a requisite for big data analysis in machine learning, geostatistics, and related domains. Against this backdrop, this study proposes a scalable GWR (ScaGWR) for large data sets. The key improvement is the calibration of the model through a precompression of the matrices and vectors whose size depends on the sample size, prior to the leave-one-out cross-validation, which is the heaviest computational step in conventional GWR. This precompression allows us to run the proposed GWR extension so that its computation time increases linearly with the sample size. With this improvement, the ScaGWR can be calibrated with 1 million observations without parallelization. Moreover, the ScaGWR estimator can be regarded as an empirical Bayesian estimator that is more stable than the conventional GWR estimator. We compare the ScaGWR with the conventional GWR in terms of estimation accuracy and computational efficiency using a Monte Carlo simulation. Then, we apply these methods to a U.S. income analysis. The code for ScaGWR is available in the R package scgwr. The code is embedded into C++ code and implemented in another R package, GWmodel.

中文翻译：

可扩展的GWR：具有多项式核的大规模地理加权回归的线性时间算法

尽管许多研究已经为大型样本开发了快速地理加权回归（GWR）算法，但没有一个获得线性时间估计，这被认为是机器学习，地统计学和相关领域中大数据分析的必要条件。在此背景下，本研究提出了一种适用于大型数据集的可扩展GWR（ScaGWR）。关键的改进是在留一法交叉验证之前，通过对矩阵和向量进行预压缩来对模型进行校准，矩阵和向量的大小取决于样本大小，这是常规GWR中最重的计算步骤。这种预压缩使我们可以运行建议的GWR扩展，以便其计算时间随样本大小线性增加。通过此改进，可以在不进行并行化的情况下用100万个观测值校准ScaGWR。此外，ScaGWR估计器可以被认为是经验贝叶斯估计器，其比常规GWR估计器更稳定。我们使用蒙特卡洛模拟在估算精度和计算效率方面将ScaGWR与常规GWR进行了比较。然后，我们将这些方法应用于美国收入分析。R包scgwr中提供了ScaGWR的代码。该代码嵌入到C ++代码中，并在另一个R包GWmodel中实现。R包scgwr中提供了ScaGWR的代码。该代码嵌入到C ++代码中，并在另一个R包GWmodel中实现。R包scgwr中提供了ScaGWR的代码。该代码嵌入到C ++代码中，并在另一个R包GWmodel中实现。

更新日期：2020-08-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文