Spatially clustered regression

doi:10.1016/j.spasta.2021.100525

Spatial Statistics

Volume 44, August 2021, 100525

https://doi.org/10.1016/j.spasta.2021.100525 Get rights and content

Abstract

Spatial regression and geographically weighted regression models have been widely adopted to capture the effects of auxiliary information on a response variable of interest over a region. In contrast, relationships between response and auxiliary variables are expected to exhibit complex spatial patterns in many applications. This paper proposes a new approach for spatial regression, called spatially clustered regression, to estimate possibly clustered spatial patterns of the relationships. We combine K-means-based clustering formulation and penalty function motivated from a spatial process known as Potts model for encouraging similar clustering in neighboring locations. We provide a simple iterative algorithm to fit the proposed method, scalable for large spatial datasets. Through simulation studies, the proposed method demonstrates its superior performance to existing methods even under the true structure does not admit spatial clustering. Finally, the proposed method is applied to crime event data in Tokyo and produces interpretable results for spatial patterns. The R code is available at https://github.com/sshonosuke/SCR.

Introduction

Spatial heterogeneity, which is often referred to as the Second Law of Geography (Goodchild, 2004), is ubiquitous in spatial science. Geographically weighted regression (GWR; Brunsdon et al., 1998, Fotheringham et al., 2002), which is a representative approach for modeling spatial heterogeneity, has widely been adopted for modeling spatially varying regression coefficients; its applications cover social science (e.g. Hu et al., 2016), epidemiology (e.g. Nakaya et al., 2005) and environmental science (e.g. Zhou et al., 2019).

Despite the success, GWR is known to be numerically unstable and may produce extreme estimates of coefficients (e.g. Wheeler and Tiefelsdorf, 2005, Cho et al., 2009). To address the drawback, a wide variety of regularized GWR approaches have been developed (e.g. Wheeler, 2007, Wheeler, 2009, Bárcena et al., 2014, Comber et al., 2016). Still, it is less clear how to regularize GWR to improve stability while maintaining its computational efficiency. Bayesian spatially varying coefficient model (Gelfand et al., 2003, Finley, 2011) is another popular approach for modeling spatial heterogeneity in regression coefficients. While Wheeler and Waller (2009) and Wolf et al. (2018) among others have suggested its stability and estimation accuracy, this approach can be computationally very intensive for large samples, limiting applications of spatial regression techniques to large spatial datasets. Therefore, an alternative method that has stable estimation performance, as well as computational efficiency under large datasets, is strongly required.

This paper proposes a new effective approach for spatial regression with possibly spatially varying coefficients or non-stationarity. Our fundamental idea is a combination of regression modeling and clustering; we assume all the geographical locations can be divided into a finite number of groups, where locations in the same groups share the same regression coefficients. Hence, possibly smoothed surfaces of varying regression coefficients are approximated by step functions. Owing to the clustering technique, the estimation results would be numerically stable and more accessible to interpret than GWR. The idea to incorporate spatial clustering into regression is not new. There have been some two-stage procedures (e.g. Anselin, 1990, Billé et al., 2017, Lee et al., 2017, Nicholson et al., 2019), but they tend to be ad-hoc combinations of clustering and regression. In contrast, the proposed method carries out regression and clustering simultaneously, which can produce reasonable spatial clustering depending on regression structures.

To introduce such a spatial clustering nature, we employ indicators showing the group to which the corresponding location belongs, and we estimate the grouping parameters and group-wise regression models simultaneously. For estimating group memberships, it would be reasonable to impose that the geographically neighboring locations are likely to belong to the same groups. To this end, we introduce a penalty function to encourage such spatially clustered structures motivated from the hidden Potts model (Potts, 1952) that was originally developed for modeling spatially correlated integers. We will demonstrate that the proposed objective function can be easily optimized by a simple iterative algorithm similar to $K$ -means clustering. In particular, updating steps in each iteration do not require computationally intensive manipulations, so that the proposed algorithm is much more scalable than GWR. For selecting the number of groups $G$ , we employ an information criterion. Moreover, the proposed approach allows substantial extensions to include variable selection or semiparametric additive modeling, which cannot be achieved by existing techniques such as GWR.

Recently, sophisticated statistical methods combining regression modeling and clustering have been studied in the literature. In the context of spatial regression, Li and Sang (2019) and Zhao and Bondell (2020) adopted a fused lasso approach to shrink regression coefficients in neighboring areas toward $0$ , which results in spatially clustered regression coefficients. However, the computation cost under large datasets is substantial, and the performance is not necessarily reasonable, possibly because the method does not take account of spatially heterogeneous variances, which will be demonstrated in our numerical studies. On the other hand, in the context of panel data analysis, clustering approaches using grouping indicators like the proposed method have been widely studied (e.g. Bonhomme and Manresa, 2015, Wang et al., 2018, Ito and Sugasawa, 2020). Still, the existing works did not take account of spatial similarities among the grouping indicators.

This paper is organized as follows. In Section 2, we introduce the proposed methods, estimation algorithms and discuss some related issues. In Section 3, we evaluate the numerical performance of the proposed methods together with some existing methods through simulation studies. In Section 4, we demonstrate the proposed method through spatial regression modeling of the number of crimes in the Tokyo metropolitan area. Finally, we give some discussions in Section 5.

Section snippets

Models and estimation algorithm

Let $y_{i}$ be a response variable and $x_{i}$ is a vector of covariates in the $i$ th location, for $i = 1, \dots, n$ , where $n$ is the number of samples. We suppose we are interested in the conditional distribution $f (y_{i} | x_{i}; θ_{i}, ψ)$ , where $θ_{i}$ and $ψ$ are vectors of unknown parameters. Here $θ_{i}$ may change over different locations and represent spatial heterogeneity while $ψ$ is assumed constant in all the areas. For example, $f (y_{i} | x_{i}; θ_{i}, ψ) = ϕ (y_{i}; x_{i 1}^{⊤} θ_{i} + x_{i 2}^{⊤} γ, σ^{2})$ with $x_{i} = {(x_{i 1}^{⊤}, x_{i 2}^{⊤})}^{⊤}$ and $ψ = {(γ^{⊤}, σ^{2})}^{⊤}$ . We assume that location

Simulation settings

We present simulation studies to illustrate the performance of the proposed spatially clustered regression (SCR) and spatially fuzzy clustered regression (SFCR) methods under two scenarios for underlying structures of regression coefficients. In both scenarios, we uniformly generated $n = 1000$ spatial locations $s_{1}, \dots, s_{n}$ in the domain ${s = (s_{1}, s_{2}) | s_{1} \in [- 1, 1], s_{2} \in [0, 2], s_{1}^{2} + 0.5 s_{2}^{2} > {(0.5)}^{2}}$ . Then, we generated two covariates from spatial processes, following Li and Sang (2019). Let $z_{1} (s_{i})$ and $z_{2} (s_{i})$ be the

Application to crime risk modeling

Here we apply the proposed methods to a dataset of the number of police-recorded crime in the Tokyo metropolitan area, provided by the University of Tsukuba and publicly available online (“GIS database of the number of police-recorded crime at O-aza, chome in Tokyo, 2009-2017”, available at https://commons.sk.tsukuba.ac.jp/data_en). In this study, we focus on the number of violent crimes in $n = 2, 855$ local towns in the Tokyo metropolitan area in 2015. For auxiliary information about each town, we

Concluding remarks

This paper proposes a new spatial regression technique, called spatially clustered regression (SCR), accounting for spatial heterogeneity in model parameters by explicitly introducing grouping parameters. By employing a penalty function motivated by the Potts model, we formulated the penalized likelihood function easily maximized via a simple iterative algorithm. We also developed a fuzzy version of the method that can produce more spatially smoothed estimates and considered straightforward but

Acknowledgments

We would like to thank two anonymous reviewers for useful comments and suggestions. This work was supported by the Japan Society for the Promotion of Science (KAKENHI) Grant Numbers 18K12757 and 18H03628.

References (40)

HuS. et al.
Spatially non-stationary relationships between urban residential land price and impact factors in wuhan city, China
Appl. Geogr.
(2016)
NicholsonD. et al.
A spatial regression and clustering method for developing place-specific social vulnerability indices using census and social media data
Int. J. Disaster Risk Reduct.
(2019)
ZhaoY. et al.
Solution paths for the generalized lasso with applications to spatially varying coefficients regression
Comput. Statist. Data Anal.
(2020)
ZhouQ. et al.
Application of geographically weighted regression (GWR) in the analysis of the cause of haze pollution in China
Atmospheric Pollut. Res.
(2019)
AnselinL.
Spatial dependence and spatial structural instability in applied regression analysis
J. Reg. Sci.
(1990)
BárcenaM.J. et al.
Alleviating the effect of collinearity in geographically weighted regression
J. Geogr. Syst.
(2014)
BernascoW. et al.
Effects of residential history on commercial robbers’ crime location choices
European J. Criminol.
(2010)
BilléA.G. et al.
A two-step approach to account for unobserved spatial heterogeneity
Spatial Econ. Anal.
(2017)
BivandR. et al.
spgwr: Geographically weighted regression
(2020)
BonhommeS. et al.
Grouped pattens of heterogeneity in panel data
Econometrica
(2015)

BrunsdonC. et al.

Geographically weighted regression – modelling spatial non-stationarity

J. R. Stat. Soc. Ser. D

(1998)

ChoS. et al.

Extreme coefficients in geographically weighted regression and their effects on mapping

GIScience Remote Sens.

(2009)

Comber, A., Harris, P., Quan, N., Chi, K., Hung, T., Phe, H., Local variation in hedonic house price, Hanoi: a spatial...

FinleyA.O.

Comparing spatially-varying coefficients models for analysis of ecological data with non-stationary and anisotropic residual dependence

Methods Ecol. Evol.

(2011)

FotheringhamA. et al.

Geographically Weighted Regression

(2002)

FotheringhamA.S. et al.

Multiscale geographically weighted regression (MGWR)

Annal. Am. Assoc. Geogr.

(2017)

FriedmanJ. et al.

Paths for generalized linear models via coordinate descent

J. Stat. Softw.

(2010)

GelfandA.E. et al.

Spatial modeling with spatially varying coefficient prosseses

J. Am. Stat. Assoc.

(2003)

GolliniI. et al.

Gwmodel: An R package for exploring spatial heterogeneity using geographically weighted models

J. Stat. Softw.

(2015)

GoodchildM.F.

The validity and usefulness of laws in geographic information science and geography

Annal. Assoc. Am. Geogr.

(2004)

Cited by (13)

For whom the bell tolls. A spatial analysis of the renewable energy transition determinants in Europe in light of the Russia-Ukraine war
2024, Journal of Environmental Management
The ongoing invasion of Russia of Ukraine and energy crises have sparked concern about economic and geopolitical stability all over the world. In Europe, the war has destabilized and endangered the energy cooperation and transition between European countries within and outside of the EU. This emergency has shown once more the importance of energy resilience policies to offset the vulnerability of energy systems and energy insecurity at the national and regional levels. Consilience has been reached on the necessity of enhancing EU energy security as an adaptation strategy. This work contributes to the existing scholarship on renewable energy transition and citizens' perception, focusing on European Union member states. Key characteristics of the renewable energy transition in the EU prior to the energy crisis and the war in Ukraine are examined. To this end, we analyze selected economic, environmental, social, policy and political variables on energy sorting from the Eurobarometer studying European citizens’ perceptions. The exercise makes use of spatially-clustered regression to explore spatial heterogeneity and to elicit determinant information on specific regional groups. We learn that southern Europeans attribute less importance to energy infrastructure to facilitate the renewable energy transition and repute the EU solidity not a requirement for energy security access. Conversely, northern European citizens tend not to associate the responsibility of the EU in guiding competitiveness and policy toward green energy sources transformation. Robustness tests confirm our hypothesis. Regardless of regional differences, the EU energy and ecological transition will thrive with industrial and political cohesion. This process will pass through increased trust in institutions and dedicated energy policy action which will smooth the risks and disruptions coming from current and future shocks.
Sub-model aggregation for scalable eigenvector spatial filtering: Application to spatially varying coefficient modeling
2024, arXiv
Sub-Model Aggregation for Scalable Eigenvector Spatial Filtering: Application to Spatially Varying Coefficient Modeling
2024, Geographical Analysis
Hybridizing Geographically Weighted Regression and Multilevel Models: A New Approach to Capture Contextual Effects in Geographical Analyses
2024, Geographical Analysis
Stochastic gradient geographical weighted regression (sgGWR): scalable bandwidth optimization for geographically weighted regression
2024, International Journal of Geographical Information Science
Geoinformation system for dynamic spatial clustering of distributed data sources
2023, Vestnik Tomskogo Gosudarstvennogo Universiteta - Upravlenie, Vychislitel'naya Tekhnika i Informatika

View all citing articles on Scopus

View full text

Spatially clustered regression

Abstract

Introduction

Section snippets

Models and estimation algorithm

Simulation settings

Application to crime risk modeling

Concluding remarks

Acknowledgments

Appl. Geogr.

Int. J. Disaster Risk Reduct.

Comput. Statist. Data Anal.

Atmospheric Pollut. Res.

Spatial dependence and spatial structural instability in applied regression analysis

J. Reg. Sci.

Alleviating the effect of collinearity in geographically weighted regression

J. Geogr. Syst.

Effects of residential history on commercial robbers’ crime location choices

European J. Criminol.

A two-step approach to account for unobserved spatial heterogeneity

Spatial Econ. Anal.

spgwr: Geographically weighted regression

Grouped pattens of heterogeneity in panel data

Econometrica

Geographically weighted regression – modelling spatial non-stationarity

J. R. Stat. Soc. Ser. D

Extreme coefficients in geographically weighted regression and their effects on mapping

GIScience Remote Sens.

Comparing spatially-varying coefficients models for analysis of ecological data with non-stationary and anisotropic residual dependence

Methods Ecol. Evol.

Geographically Weighted Regression

Multiscale geographically weighted regression (MGWR)

Annal. Am. Assoc. Geogr.

Paths for generalized linear models via coordinate descent

J. Stat. Softw.

Spatial modeling with spatially varying coefficient prosseses

J. Am. Stat. Assoc.

Gwmodel: An R package for exploring spatial heterogeneity using geographically weighted models

J. Stat. Softw.

The validity and usefulness of laws in geographic information science and geography

Annal. Assoc. Am. Geogr.