Elsevier

Spatial Statistics

Volume 44, August 2021, 100525
Spatial Statistics

Spatially clustered regression

https://doi.org/10.1016/j.spasta.2021.100525Get rights and content

Abstract

Spatial regression and geographically weighted regression models have been widely adopted to capture the effects of auxiliary information on a response variable of interest over a region. In contrast, relationships between response and auxiliary variables are expected to exhibit complex spatial patterns in many applications. This paper proposes a new approach for spatial regression, called spatially clustered regression, to estimate possibly clustered spatial patterns of the relationships. We combine K-means-based clustering formulation and penalty function motivated from a spatial process known as Potts model for encouraging similar clustering in neighboring locations. We provide a simple iterative algorithm to fit the proposed method, scalable for large spatial datasets. Through simulation studies, the proposed method demonstrates its superior performance to existing methods even under the true structure does not admit spatial clustering. Finally, the proposed method is applied to crime event data in Tokyo and produces interpretable results for spatial patterns. The R code is available at https://github.com/sshonosuke/SCR.

Introduction

Spatial heterogeneity, which is often referred to as the Second Law of Geography (Goodchild, 2004), is ubiquitous in spatial science. Geographically weighted regression (GWR; Brunsdon et al., 1998, Fotheringham et al., 2002), which is a representative approach for modeling spatial heterogeneity, has widely been adopted for modeling spatially varying regression coefficients; its applications cover social science (e.g. Hu et al., 2016), epidemiology (e.g. Nakaya et al., 2005) and environmental science (e.g. Zhou et al., 2019).

Despite the success, GWR is known to be numerically unstable and may produce extreme estimates of coefficients (e.g. Wheeler and Tiefelsdorf, 2005, Cho et al., 2009). To address the drawback, a wide variety of regularized GWR approaches have been developed (e.g. Wheeler, 2007, Wheeler, 2009, Bárcena et al., 2014, Comber et al., 2016). Still, it is less clear how to regularize GWR to improve stability while maintaining its computational efficiency. Bayesian spatially varying coefficient model (Gelfand et al., 2003, Finley, 2011) is another popular approach for modeling spatial heterogeneity in regression coefficients. While Wheeler and Waller (2009) and Wolf et al. (2018) among others have suggested its stability and estimation accuracy, this approach can be computationally very intensive for large samples, limiting applications of spatial regression techniques to large spatial datasets. Therefore, an alternative method that has stable estimation performance, as well as computational efficiency under large datasets, is strongly required.

This paper proposes a new effective approach for spatial regression with possibly spatially varying coefficients or non-stationarity. Our fundamental idea is a combination of regression modeling and clustering; we assume all the geographical locations can be divided into a finite number of groups, where locations in the same groups share the same regression coefficients. Hence, possibly smoothed surfaces of varying regression coefficients are approximated by step functions. Owing to the clustering technique, the estimation results would be numerically stable and more accessible to interpret than GWR. The idea to incorporate spatial clustering into regression is not new. There have been some two-stage procedures (e.g. Anselin, 1990, Billé et al., 2017, Lee et al., 2017, Nicholson et al., 2019), but they tend to be ad-hoc combinations of clustering and regression. In contrast, the proposed method carries out regression and clustering simultaneously, which can produce reasonable spatial clustering depending on regression structures.

To introduce such a spatial clustering nature, we employ indicators showing the group to which the corresponding location belongs, and we estimate the grouping parameters and group-wise regression models simultaneously. For estimating group memberships, it would be reasonable to impose that the geographically neighboring locations are likely to belong to the same groups. To this end, we introduce a penalty function to encourage such spatially clustered structures motivated from the hidden Potts model (Potts, 1952) that was originally developed for modeling spatially correlated integers. We will demonstrate that the proposed objective function can be easily optimized by a simple iterative algorithm similar to K-means clustering. In particular, updating steps in each iteration do not require computationally intensive manipulations, so that the proposed algorithm is much more scalable than GWR. For selecting the number of groups G, we employ an information criterion. Moreover, the proposed approach allows substantial extensions to include variable selection or semiparametric additive modeling, which cannot be achieved by existing techniques such as GWR.

Recently, sophisticated statistical methods combining regression modeling and clustering have been studied in the literature. In the context of spatial regression, Li and Sang (2019) and Zhao and Bondell (2020) adopted a fused lasso approach to shrink regression coefficients in neighboring areas toward 0, which results in spatially clustered regression coefficients. However, the computation cost under large datasets is substantial, and the performance is not necessarily reasonable, possibly because the method does not take account of spatially heterogeneous variances, which will be demonstrated in our numerical studies. On the other hand, in the context of panel data analysis, clustering approaches using grouping indicators like the proposed method have been widely studied (e.g. Bonhomme and Manresa, 2015, Wang et al., 2018, Ito and Sugasawa, 2020). Still, the existing works did not take account of spatial similarities among the grouping indicators.

This paper is organized as follows. In Section 2, we introduce the proposed methods, estimation algorithms and discuss some related issues. In Section 3, we evaluate the numerical performance of the proposed methods together with some existing methods through simulation studies. In Section 4, we demonstrate the proposed method through spatial regression modeling of the number of crimes in the Tokyo metropolitan area. Finally, we give some discussions in Section 5.

Section snippets

Models and estimation algorithm

Let yi be a response variable and xi is a vector of covariates in the ith location, for i=1,,n, where n is the number of samples. We suppose we are interested in the conditional distribution f(yi|xi;θi,ψ), where θi and ψ are vectors of unknown parameters. Here θi may change over different locations and represent spatial heterogeneity while ψ is assumed constant in all the areas. For example, f(yi|xi;θi,ψ)=ϕ(yi;xi1θi+xi2γ,σ2) with xi=(xi1,xi2) and ψ=(γ,σ2). We assume that location

Simulation settings

We present simulation studies to illustrate the performance of the proposed spatially clustered regression (SCR) and spatially fuzzy clustered regression (SFCR) methods under two scenarios for underlying structures of regression coefficients. In both scenarios, we uniformly generated n=1000 spatial locations s1,,sn in the domain {s=(s1,s2)|s1[1,1],s2[0,2],s12+0.5s22>(0.5)2}. Then, we generated two covariates from spatial processes, following Li and Sang (2019). Let z1(si) and z2(si) be the

Application to crime risk modeling

Here we apply the proposed methods to a dataset of the number of police-recorded crime in the Tokyo metropolitan area, provided by the University of Tsukuba and publicly available online (“GIS database of the number of police-recorded crime at O-aza, chome in Tokyo, 2009-2017”, available at https://commons.sk.tsukuba.ac.jp/data_en). In this study, we focus on the number of violent crimes in n=2,855 local towns in the Tokyo metropolitan area in 2015. For auxiliary information about each town, we

Concluding remarks

This paper proposes a new spatial regression technique, called spatially clustered regression (SCR), accounting for spatial heterogeneity in model parameters by explicitly introducing grouping parameters. By employing a penalty function motivated by the Potts model, we formulated the penalized likelihood function easily maximized via a simple iterative algorithm. We also developed a fuzzy version of the method that can produce more spatially smoothed estimates and considered straightforward but

Acknowledgments

We would like to thank two anonymous reviewers for useful comments and suggestions. This work was supported by the Japan Society for the Promotion of Science (KAKENHI) Grant Numbers 18K12757 and 18H03628.

References (40)

  • BrunsdonC. et al.

    Geographically weighted regression – modelling spatial non-stationarity

    J. R. Stat. Soc. Ser. D

    (1998)
  • ChoS. et al.

    Extreme coefficients in geographically weighted regression and their effects on mapping

    GIScience Remote Sens.

    (2009)
  • Comber, A., Harris, P., Quan, N., Chi, K., Hung, T., Phe, H., Local variation in hedonic house price, Hanoi: a spatial...
  • FinleyA.O.

    Comparing spatially-varying coefficients models for analysis of ecological data with non-stationary and anisotropic residual dependence

    Methods Ecol. Evol.

    (2011)
  • FotheringhamA. et al.

    Geographically Weighted Regression

    (2002)
  • FotheringhamA.S. et al.

    Multiscale geographically weighted regression (MGWR)

    Annal. Am. Assoc. Geogr.

    (2017)
  • FriedmanJ. et al.

    Paths for generalized linear models via coordinate descent

    J. Stat. Softw.

    (2010)
  • GelfandA.E. et al.

    Spatial modeling with spatially varying coefficient prosseses

    J. Am. Stat. Assoc.

    (2003)
  • GolliniI. et al.

    Gwmodel: An R package for exploring spatial heterogeneity using geographically weighted models

    J. Stat. Softw.

    (2015)
  • GoodchildM.F.

    The validity and usefulness of laws in geographic information science and geography

    Annal. Assoc. Am. Geogr.

    (2004)
  • Cited by (13)

    View all citing articles on Scopus
    View full text