Elsevier

Spatial Statistics

Volume 42, April 2021, 100435
Spatial Statistics

Analysing point patterns on networks — A review

https://doi.org/10.1016/j.spasta.2020.100435Get rights and content

Abstract

We review recent research on statistical methods for analysing spatial patterns of points on a network of lines, such as road accident locations along a road network. Due to geometrical complexities, the analysis of such data is extremely challenging, and we describe several common methodological errors. The intrinsic lack of homogeneity in a network militates against the traditional methods of spatial statistics based on stationary processes. Topics include kernel density estimation, relative risk estimation, parametric and non-parametric modelling of intensity, second-order analysis using the K-function and pair correlation function, and point process model construction. An important message is that the choice of distance metric on the network is pivotal in the theoretical development and in the analysis of real data. Challenges for statistical computation are discussed and open-source software is provided.

Introduction

This paper reviews recent research on the spatial analysis of events that occur along a network of lines. Fig. 1 displays a motivating example, in which the locations of traffic accidents in a city are mapped together with the road network. Such data require the development of novel statistical methodology and computational techniques (Okabe and Sugihara, 2012; Ver Hoef et al., 2006; Chapter 17 Baddeley et al., 2015).

Spatial patterns of points along a network of lines are found in many other applications. The network might reflect a map of railways, rivers, electrical wires, nerve fibres, airline routes, irrigation canals, geological faults or soil cracks. Observations of interest could be the locations of traffic accidents, bicycle incidents, vehicle thefts or street crimes (Yamada and Thill, 2004, Lu and Chen, 2007, Xie and Yan, 2008, Ang et al., 2012, Vandenbulcke-Plasschaert, 2011); roadside trees or invasive species  (Spooner et al., 2004, Deckers et al., 2005); retail stores, roadside kiosks or urban parks  (Okabe and Kitamura, 1996, Okabe and Okunuki, 2001, Okunuki and Okabe, 2003, Comber et al., 2008); insect nests (Voss, 1999, Voss et al., 2007, Ang et al., 2012); neuroanatomical features (Yadav et al., 2012, Jammalamadaka et al., 2013, Baddeley et al., 2014) or sample points along a stream (Ver Hoef et al., 2006, Ver Hoef and Peterson, 2010, Som et al., 2014). John Snow’s pioneering observations of cholera cases in London (Snow, 1855) could also be described as a point pattern on a linear network. Spatial analysis of such data can have immediate practical value, as it did when Snow’s analysis suggested the cause of cholera transmission.

Statistical analysis of network data presents severe challenges. A network is not spatially homogeneous, which creates geometrical and computational complexities and leads to new methodological problems, with a high risk of methodological error. Real network data can also exhibit an extremely wide range of spatial scales. These problems pose a significant and far-reaching challenge to the classical methodology of spatial statistics based on stationary processes, which is largely inapplicable to data on a network.

Analysis of point patterns on linear networks has been a focus of recent research in Geographical Information Systems (GIS) (Borruso, 2005, Borruso, 2008, Downs and Horner, 2007a, Okabe and Yamada, 2001, Okabe et al., 2009, Okabe et al., 2008, Okabe et al., 2000, Shiode, 2008, Okabe and Satoh, 2006, Okabe et al., 1995, Okabe et al., 2006b, Okabe et al., 2006a, Okabe and Satoh, 2009, Shiode and Shiode, 2009, Warden, 2008, Yamada and Thill, 2007). For surveys, see Okabe and Satoh (2009) and Okabe and Sugihara (2012). In recent years there has been increased interest from the spatial statistics community (Ang et al., 2012, Baddeley et al., 2014, McSwiggan et al., 2016, Baddeley et al., 2017, Anderes et al., 2017, Rakshit et al., 2017, Moradi et al., 2018, Briz-Redón et al., 2019, Rakshit et al., 2019a, Rakshit et al., 2019b, McSwiggan et al., 2019, Moradi et al., 2019, McSwiggan, 2019; Chap. 17 Baddeley et al., 2015). Spatial analysis on a network of rivers or streams is also a highly active research topic in spatial ecology and spatial statistics, although this is more focused on geostatistical techniques for spatial variables (Cressie and Majure, 1997, Cressie et al., 2006, Isaak et al., 2014, O’Donnell et al., 2014, Ver Hoef et al., 2006, Ver Hoef and Peterson, 2010).

Section 2 presents a range of example datasets and applications. Section 3 surveys the main statistical challenges for analysis of point patterns on a network. Section 4 gives some basic technical definitions. Section 5 discusses kernel smoothing on a network. Section 6 discusses nonparametric estimation of intensity as a function of covariates. Section 7 discusses parametric modelling of point processes, model-fitting and variable selection. Section 8 discusses the analysis of clustering using the K-function and related tools. Section 9 describes the fundamental problem of existence of models with specified properties. Section 10 presents alternative ways of measuring the distance between locations on a network, and the implications for statistical analysis and modelling.

Section snippets

Data examples

In this section we present datasets from several different applications, illustrating different features and challenges. Various techniques will be demonstrated on these datasets throughout the paper. Where possible, the datasets are publicly available in the R packages spatstat (Baddeley and Turner, 2005, Baddeley et al., 2015) or spatstat.Knet (Rakshit et al., 2019a).

Overview of problems and significance

In this section we survey the main problems for analysis of point patterns on a linear network. For concreteness we shall often use terms that would be appropriate to road traffic accident analysis.

Technical definitions

Here we collect a few technical definitions and preliminaries.

Kernel density estimation

Estimation of the spatially-varying density of events is crucially important in practice. In studying road safety or transport planning, for example, such estimates are essential for accident research, the formulation of emergency response strategies, urban modelling and other purposes. Even when it is not the main focus of interest, we may need to adjust for spatially-varying density in order to study other properties. Non-uniform density can easily be confounded with clustering between

Intensity depending on a covariate

The effect of an explanatory variable on the density of points can be investigated using nonparametric curve estimation techniques.

Suppose that Z is a spatial covariate function, and we believe that the intensity of the points depends only on Z through a relationship λ(u)=ρ(Z(u))where ρ(z) is an unknown function that is to be estimated. For example if ρ(z) is a decreasing function of z, the relationship (26) specifies that the density of points will be lower in those parts of the network where Z

Parametric models and model-fitting

A major goal of statistical analysis is to formulate and fit parametric models to point pattern data on a network. The models are point processes which depend on explanatory spatial variables. The primary aim is usually to model the dependence of the intensity on the covariates, taking into account any stochastic dependence between different points.

The general theory of point processes (Daley and Vere-Jones, 2003) easily handles the definition, construction and characterisation of parametric

K-function and pair correlation function

Correlation is a widely-used statistical measure of “dependence” or “association” between variables. For spatial point patterns, the K-function and the pair correlation function are correlation measures of association between points in the pattern. They have served a valuable role in the analysis of spatial point patterns in two and three dimensions. The task is to adapt these methods to a linear network.

In the last two decades, substantial research effort has been addressed for this problem by

Construction problem and non-existence

Many standard methods for analysing spatial point patterns assume that the underlying point process is stationary. This gives access to a powerful statistical methodology, embracing nonparametric characteristics such as the K-function and pair correlation function (Illian et al., 2008), as well as parametric modelling and inference (Møller and Waagepetersen, 2004, Baddeley et al., 2015).

Much of this methodology cannot be extended to a linear network, because the network itself is not

Alternative distance metrics

Okabe and Sugihara (2012, pp. 7–8) explain carefully that, while it is often sensible to measure distances in a network by the shortest path, this is not obligatory, and may occasionally be inappropriate to the application.

Alternative metrics include the Euclidean distance between points in two-dimensional space, and the “resistance distance” defined by treating the lines of the network as electrical resistors (Klein and Randić, 1993, Bapat, 2004). Shortest-path distance may be modified by

Acknowledgements

This article includes summaries of the results of joint research with the Perth Spatial Point Processes Group (Adrian Baddeley, Gopalan Nair, Suman Rakshit), with former members (Ya-Mei Chang, Andrew Hardegen, Thomas Lawrence, and Yong Song) and with Spatial Analysis Group Otago (Tilman Davies, Martin Hazelton, Adrian Baddeley), as well as collaborators Ege Rubak, Rob Foxall and Rolf Turner.

This work was supported by Australian Research Council grant DP130102322, “Statistical methodology for

References (152)

  • MarshallJ. et al.

    Boundary kernels for adaptive density estimators on regions with irregular boundaries

    J. Multivariate Anal.

    (2010)
  • AbramsonI.

    On bandwidth estimation in kernel estimates – a square root law

    Ann. Statist.

    (1982)
  • AnderesE. et al.

    Isotropic covariance functions on graphs and their edgesTechnical Report

    (2017)
  • AndersenB.

    Methodological Errors in Medical Research

    (1990)
  • AngQ.

    Statistical Methodology for Events on a Network

    (2010)
  • AngQ. et al.

    Geometrically corrected second order analysis of events on a linear network, with applications to ecology and criminology

    Scand. J. Stat.

    (2012)
  • AnselinL.

    Local indicators of spatial association – LISA

    Geogr. Anal.

    (1995)
  • BaddeleyA. et al.

    Nonparametric estimation of the dependence of a spatial point process on a spatial covariate

    Stat. Interface

    (2012)
  • BaddeleyA. et al.

    Multitype point process analysis of spines on the dendrite network of a neuron

    J. R. Stat. Soc. Ser. C. Appl. Stat.

    (2014)
  • BaddeleyA. et al.

    Non- and semiparametric estimation of interaction in inhomogeneous point patterns

    Stat. Neerl.

    (2000)
  • BaddeleyA. et al.

    Analysis of a three-dimensional point pattern with replication

    Appl. Stat.

    (1993)
  • BaddeleyA. et al.

    ‘Stationary’ point processes are uncommon on linear networks

    STAT

    (2017)
  • BaddeleyA. et al.

    Spatial Point Patterns: Methodology and Applications with R

    (2015)
  • BaddeleyA. et al.

    A cautionary example on the use of second-order methods for analyzing point patterns

    Biometrics

    (1984)
  • BaddeleyA. et al.

    Practical maximum pseudolikelihood for spatial point patterns (with discussion)

    Aust. New Zealand J. Stat.

    (2000)
  • BaddeleyA. et al.

    Spatstat: an R package for analyzing spatial point patterns

    J. Stat. Softw.

    (2005)
  • BapatR.

    Resistance matrix of a weighted graph

    Commun. Math. Comput. Chem.

    (2004)
  • BarrC. et al.

    On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process

    Biometrika

    (2010)
  • BassettR. et al.

    Fused density estimation: theory and methods

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2019)
  • BellM. et al.

    Mixed models for the analysis of replicated spatial point patterns

    Biostatistics

    (2004)
  • BerksonJ.

    Limitations of the application of fourfold table analysis to hospital data

    Biom. Bull.

    (1946)
  • BermanM. et al.

    Approximating point process likelihoods with GLIM

    Appl. Stat.

    (1992)
  • BithellJ.

    An application of density estimation to geographical epidemiology

    Stat. Med.

    (1990)
  • BorrusoG.

    Network density and the delimitation of urban areas

    Trans. GIS

    (2003)
  • BorrusoG.

    Network density estimation: Analysis of point patterns over a network

  • BorrusoG.

    Network density estimation: A GIS approach for analysing point patterns in a network space

    Trans. GIS

    (2008)
  • BotevZ. et al.

    Kernel density estimation via diffusion

    Ann. Statist.

    (2010)
  • Briz-RedónÁ. et al.

    Spatial analysis of traffic accidents near and between road intersections in a directed linear network

    Accid. Anal. Prev.

    (2019)
  • ChaudhuriP. et al.

    Scale space view of curve estimation

    Ann. Statist.

    (2000)
  • CressieN. et al.

    Analysis of spatial point patterns using bundles of product density LISA functions

    J. Agric. Biol. Environ. Stat.

    (2001)
  • CressieN. et al.

    Spatial prediction on a river network

    J. Agric. Biol. Environ. Stat.

    (2006)
  • CressieN. et al.

    Spatio-temporal statistical modeling of livestock waste in streams

    J. Agric. Biol. Environ. Stat.

    (1997)
  • CressieN. et al.

    Statistics for Spatio-Temporal Data

    (2011)
  • DaleyD. et al.

    An Introduction to the Theory of Point Processes

    (1988)
  • DaleyD. et al.

    An Introduction to the Theory of Point Processes. Vol. I: Elementary Theory and Methods

    (2003)
  • DaleyD. et al.

    An Introduction to the Theory of Point Processes. Vol. II: General Theory and Structure

    (2008)
  • DaviesR.

    Testing the hypothesis that a point process is Poisson

    Adv. Appl. Probab.

    (1977)
  • DaviesR.

    Hypothesis testing when a nuisance parameter is present only under the alternative

    Biometrika

    (1987)
  • DaviesT. et al.

    Fast computation of spatially adaptive kernel estimates

    Stat. Comput.

    (2018)
  • DaviesT. et al.

    Adaptive kernel estimation of spatial relative risk

    Stat. Med.

    (2010)
  • Cited by (53)

    • Modeling road traffic safety based on point patterns of wildlife-vehicle collisions

      2022, Science of the Total Environment
      Citation Excerpt :

      And the geometrically corrected version of this function proposed by Ang et al. (2012), given that the function defined by Okabe and Yamada (2001) depends on the geometry of the network. Baddeley et al. (2021) provide a review of methods and theory of point process on linear networks. The occurrence of WVC are affected by several biological, ecological and meteorological covariates together with some structural road characteristics.

    • Linear hotspot detection for a point pattern in the vicinity of a linear network

      2022, Spatial Statistics
      Citation Excerpt :

      There has been a recent increase in spatial statistics methodology for spatial linear networks. The focus there has been for points lying on a linear network, such as, density estimation of points on a network (Borruso, 2005, 2008; Moradi et al., 2018; Mateu et al., 2020), distance metrics and second order analysis (Rakshit et al., 2017), space–time analysis of points on a network (Eckardt and Mateu, 2016), regression for points on a network (Eckardt and Mateu, 2018), summary statistics (Cronie et al., 2020), directed networks (Rasmussen and Christensen, 2021), the use of network based quadrants to calculate the features (Shino, 2008), the moving-segment approach, which uses distances along the network and also includes the connectivity at intersections of networks (Steenberghen et al., 2010), and local indicators of network-constrained clusters (LINCS) (Yamada and Thill, 2007), spatio-temporal analysis of points lying on a network (D’Angelo et al., 2021) and a review (Baddeley et al., 2021). The review (Baddeley et al., 2021) highlights important issues when dealing with a network space.

    • Testing biodiversity using inhomogeneous summary statistics and global envelope tests

      2022, Spatial Statistics
      Citation Excerpt :

      Fortunately, it also led to a new wave of ideas, including tackling uneven envelope widths and asymmetry (Myllymäki et al., 2015) as well as the incorporation of ideas from functional analysis on ordering curves by their functional depth (Myllymäki et al., 2016; Ramsay and Silverman, 2006). Some other recent trends in point process theory, such as the development of models on non-Euclidean spaces including graphs (Anderes et al., 2020; Lieshout, 2018a), linear networks (Baddeley et al., 2021) or spheres (Møller and Rubak, 2016) so far seem to have had little impact on ecology, although determinantal point processes (Lavancier et al., 2015; Macchi, 2017) offer great potential for design-based sampling as well as numerical integration (Belhadji et al., 2019). There is a great variety of possible choices for the kernel.

    View all citing articles on Scopus
    View full text