Analysing point patterns on networks — A review
Introduction
This paper reviews recent research on the spatial analysis of events that occur along a network of lines. Fig. 1 displays a motivating example, in which the locations of traffic accidents in a city are mapped together with the road network. Such data require the development of novel statistical methodology and computational techniques (Okabe and Sugihara, 2012; Ver Hoef et al., 2006; Chapter 17 Baddeley et al., 2015).
Spatial patterns of points along a network of lines are found in many other applications. The network might reflect a map of railways, rivers, electrical wires, nerve fibres, airline routes, irrigation canals, geological faults or soil cracks. Observations of interest could be the locations of traffic accidents, bicycle incidents, vehicle thefts or street crimes (Yamada and Thill, 2004, Lu and Chen, 2007, Xie and Yan, 2008, Ang et al., 2012, Vandenbulcke-Plasschaert, 2011); roadside trees or invasive species (Spooner et al., 2004, Deckers et al., 2005); retail stores, roadside kiosks or urban parks (Okabe and Kitamura, 1996, Okabe and Okunuki, 2001, Okunuki and Okabe, 2003, Comber et al., 2008); insect nests (Voss, 1999, Voss et al., 2007, Ang et al., 2012); neuroanatomical features (Yadav et al., 2012, Jammalamadaka et al., 2013, Baddeley et al., 2014) or sample points along a stream (Ver Hoef et al., 2006, Ver Hoef and Peterson, 2010, Som et al., 2014). John Snow’s pioneering observations of cholera cases in London (Snow, 1855) could also be described as a point pattern on a linear network. Spatial analysis of such data can have immediate practical value, as it did when Snow’s analysis suggested the cause of cholera transmission.
Statistical analysis of network data presents severe challenges. A network is not spatially homogeneous, which creates geometrical and computational complexities and leads to new methodological problems, with a high risk of methodological error. Real network data can also exhibit an extremely wide range of spatial scales. These problems pose a significant and far-reaching challenge to the classical methodology of spatial statistics based on stationary processes, which is largely inapplicable to data on a network.
Analysis of point patterns on linear networks has been a focus of recent research in Geographical Information Systems (GIS) (Borruso, 2005, Borruso, 2008, Downs and Horner, 2007a, Okabe and Yamada, 2001, Okabe et al., 2009, Okabe et al., 2008, Okabe et al., 2000, Shiode, 2008, Okabe and Satoh, 2006, Okabe et al., 1995, Okabe et al., 2006b, Okabe et al., 2006a, Okabe and Satoh, 2009, Shiode and Shiode, 2009, Warden, 2008, Yamada and Thill, 2007). For surveys, see Okabe and Satoh (2009) and Okabe and Sugihara (2012). In recent years there has been increased interest from the spatial statistics community (Ang et al., 2012, Baddeley et al., 2014, McSwiggan et al., 2016, Baddeley et al., 2017, Anderes et al., 2017, Rakshit et al., 2017, Moradi et al., 2018, Briz-Redón et al., 2019, Rakshit et al., 2019a, Rakshit et al., 2019b, McSwiggan et al., 2019, Moradi et al., 2019, McSwiggan, 2019; Chap. 17 Baddeley et al., 2015). Spatial analysis on a network of rivers or streams is also a highly active research topic in spatial ecology and spatial statistics, although this is more focused on geostatistical techniques for spatial variables (Cressie and Majure, 1997, Cressie et al., 2006, Isaak et al., 2014, O’Donnell et al., 2014, Ver Hoef et al., 2006, Ver Hoef and Peterson, 2010).
Section 2 presents a range of example datasets and applications. Section 3 surveys the main statistical challenges for analysis of point patterns on a network. Section 4 gives some basic technical definitions. Section 5 discusses kernel smoothing on a network. Section 6 discusses nonparametric estimation of intensity as a function of covariates. Section 7 discusses parametric modelling of point processes, model-fitting and variable selection. Section 8 discusses the analysis of clustering using the K-function and related tools. Section 9 describes the fundamental problem of existence of models with specified properties. Section 10 presents alternative ways of measuring the distance between locations on a network, and the implications for statistical analysis and modelling.
Section snippets
Data examples
In this section we present datasets from several different applications, illustrating different features and challenges. Various techniques will be demonstrated on these datasets throughout the paper. Where possible, the datasets are publicly available in the R packages spatstat (Baddeley and Turner, 2005, Baddeley et al., 2015) or spatstat.Knet (Rakshit et al., 2019a).
Overview of problems and significance
In this section we survey the main problems for analysis of point patterns on a linear network. For concreteness we shall often use terms that would be appropriate to road traffic accident analysis.
Technical definitions
Here we collect a few technical definitions and preliminaries.
Kernel density estimation
Estimation of the spatially-varying density of events is crucially important in practice. In studying road safety or transport planning, for example, such estimates are essential for accident research, the formulation of emergency response strategies, urban modelling and other purposes. Even when it is not the main focus of interest, we may need to adjust for spatially-varying density in order to study other properties. Non-uniform density can easily be confounded with clustering between
Intensity depending on a covariate
The effect of an explanatory variable on the density of points can be investigated using nonparametric curve estimation techniques.
Suppose that is a spatial covariate function, and we believe that the intensity of the points depends only on through a relationship where is an unknown function that is to be estimated. For example if is a decreasing function of , the relationship (26) specifies that the density of points will be lower in those parts of the network where
Parametric models and model-fitting
A major goal of statistical analysis is to formulate and fit parametric models to point pattern data on a network. The models are point processes which depend on explanatory spatial variables. The primary aim is usually to model the dependence of the intensity on the covariates, taking into account any stochastic dependence between different points.
The general theory of point processes (Daley and Vere-Jones, 2003) easily handles the definition, construction and characterisation of parametric
K-function and pair correlation function
Correlation is a widely-used statistical measure of “dependence” or “association” between variables. For spatial point patterns, the K-function and the pair correlation function are correlation measures of association between points in the pattern. They have served a valuable role in the analysis of spatial point patterns in two and three dimensions. The task is to adapt these methods to a linear network.
In the last two decades, substantial research effort has been addressed for this problem by
Construction problem and non-existence
Many standard methods for analysing spatial point patterns assume that the underlying point process is stationary. This gives access to a powerful statistical methodology, embracing nonparametric characteristics such as the K-function and pair correlation function (Illian et al., 2008), as well as parametric modelling and inference (Møller and Waagepetersen, 2004, Baddeley et al., 2015).
Much of this methodology cannot be extended to a linear network, because the network itself is not
Alternative distance metrics
Okabe and Sugihara (2012, pp. 7–8) explain carefully that, while it is often sensible to measure distances in a network by the shortest path, this is not obligatory, and may occasionally be inappropriate to the application.
Alternative metrics include the Euclidean distance between points in two-dimensional space, and the “resistance distance” defined by treating the lines of the network as electrical resistors (Klein and Randić, 1993, Bapat, 2004). Shortest-path distance may be modified by
Acknowledgements
This article includes summaries of the results of joint research with the Perth Spatial Point Processes Group (Adrian Baddeley, Gopalan Nair, Suman Rakshit), with former members (Ya-Mei Chang, Andrew Hardegen, Thomas Lawrence, and Yong Song) and with Spatial Analysis Group Otago (Tilman Davies, Martin Hazelton, Adrian Baddeley), as well as collaborators Ege Rubak, Rob Foxall and Rolf Turner.
This work was supported by Australian Research Council grant DP130102322, “Statistical methodology for
References (152)
Kernel density estimation and K-means clustering to profile road accident hotspots
Accid. Anal. Prev.
(2009)Local composite likelihood for spatial point processes
Spat. Stat.
(2017)- et al.
Using a GIS-based network analysis to determine urban greenspace accessibility for different ethnic and religious groups
Landsc. Urban Plan.
(2008) - et al.
Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function
Comput. Statist. Data Anal.
(2016) - et al.
Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation
J. Multivariate Anal.
(2005) - et al.
Cost distance defined by a topological function of landscape
Ecol. Model.
(2008) - et al.
The application of K-function analysis to the geographical distribution of road traffic accident outcomes in Norfolk, England
Soc. Sci. Med.
(1996) - et al.
The statistical analysis of crash frequency data: a review and assessment of methodological alternatives
Transp. Res. A
(2010) - et al.
Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory
Accid. Anal. Prev.
(2005) - et al.
On the false alarm of planar K-function when analyzing urban crime distributed along streets
Soc. Sci. Res.
(2007)
Boundary kernels for adaptive density estimators on regions with irregular boundaries
J. Multivariate Anal.
On bandwidth estimation in kernel estimates – a square root law
Ann. Statist.
Isotropic covariance functions on graphs and their edgesTechnical Report
Methodological Errors in Medical Research
Statistical Methodology for Events on a Network
Geometrically corrected second order analysis of events on a linear network, with applications to ecology and criminology
Scand. J. Stat.
Local indicators of spatial association – LISA
Geogr. Anal.
Nonparametric estimation of the dependence of a spatial point process on a spatial covariate
Stat. Interface
Multitype point process analysis of spines on the dendrite network of a neuron
J. R. Stat. Soc. Ser. C. Appl. Stat.
Non- and semiparametric estimation of interaction in inhomogeneous point patterns
Stat. Neerl.
Analysis of a three-dimensional point pattern with replication
Appl. Stat.
‘Stationary’ point processes are uncommon on linear networks
STAT
Spatial Point Patterns: Methodology and Applications with R
A cautionary example on the use of second-order methods for analyzing point patterns
Biometrics
Practical maximum pseudolikelihood for spatial point patterns (with discussion)
Aust. New Zealand J. Stat.
Spatstat: an R package for analyzing spatial point patterns
J. Stat. Softw.
Resistance matrix of a weighted graph
Commun. Math. Comput. Chem.
On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process
Biometrika
Fused density estimation: theory and methods
J. R. Stat. Soc. Ser. B Stat. Methodol.
Mixed models for the analysis of replicated spatial point patterns
Biostatistics
Limitations of the application of fourfold table analysis to hospital data
Biom. Bull.
Approximating point process likelihoods with GLIM
Appl. Stat.
An application of density estimation to geographical epidemiology
Stat. Med.
Network density and the delimitation of urban areas
Trans. GIS
Network density estimation: Analysis of point patterns over a network
Network density estimation: A GIS approach for analysing point patterns in a network space
Trans. GIS
Kernel density estimation via diffusion
Ann. Statist.
Spatial analysis of traffic accidents near and between road intersections in a directed linear network
Accid. Anal. Prev.
Scale space view of curve estimation
Ann. Statist.
Analysis of spatial point patterns using bundles of product density LISA functions
J. Agric. Biol. Environ. Stat.
Spatial prediction on a river network
J. Agric. Biol. Environ. Stat.
Spatio-temporal statistical modeling of livestock waste in streams
J. Agric. Biol. Environ. Stat.
Statistics for Spatio-Temporal Data
An Introduction to the Theory of Point Processes
An Introduction to the Theory of Point Processes. Vol. I: Elementary Theory and Methods
An Introduction to the Theory of Point Processes. Vol. II: General Theory and Structure
Testing the hypothesis that a point process is Poisson
Adv. Appl. Probab.
Hypothesis testing when a nuisance parameter is present only under the alternative
Biometrika
Fast computation of spatially adaptive kernel estimates
Stat. Comput.
Adaptive kernel estimation of spatial relative risk
Stat. Med.
Cited by (53)
The immediate effects of vision-zero corridor upgrades on pedestrian crashes in New York: A before-and-after spatial point process approach
2024, Accident Analysis and PreventionDealing with location uncertainty for modeling network-constrained lattice data
2024, Spatial StatisticsModeling road traffic safety based on point patterns of wildlife-vehicle collisions
2022, Science of the Total EnvironmentCitation Excerpt :And the geometrically corrected version of this function proposed by Ang et al. (2012), given that the function defined by Okabe and Yamada (2001) depends on the geometry of the network. Baddeley et al. (2021) provide a review of methods and theory of point process on linear networks. The occurrence of WVC are affected by several biological, ecological and meteorological covariates together with some structural road characteristics.
Linear hotspot detection for a point pattern in the vicinity of a linear network
2022, Spatial StatisticsCitation Excerpt :There has been a recent increase in spatial statistics methodology for spatial linear networks. The focus there has been for points lying on a linear network, such as, density estimation of points on a network (Borruso, 2005, 2008; Moradi et al., 2018; Mateu et al., 2020), distance metrics and second order analysis (Rakshit et al., 2017), space–time analysis of points on a network (Eckardt and Mateu, 2016), regression for points on a network (Eckardt and Mateu, 2018), summary statistics (Cronie et al., 2020), directed networks (Rasmussen and Christensen, 2021), the use of network based quadrants to calculate the features (Shino, 2008), the moving-segment approach, which uses distances along the network and also includes the connectivity at intersections of networks (Steenberghen et al., 2010), and local indicators of network-constrained clusters (LINCS) (Yamada and Thill, 2007), spatio-temporal analysis of points lying on a network (D’Angelo et al., 2021) and a review (Baddeley et al., 2021). The review (Baddeley et al., 2021) highlights important issues when dealing with a network space.
Testing biodiversity using inhomogeneous summary statistics and global envelope tests
2022, Spatial StatisticsCitation Excerpt :Fortunately, it also led to a new wave of ideas, including tackling uneven envelope widths and asymmetry (Myllymäki et al., 2015) as well as the incorporation of ideas from functional analysis on ordering curves by their functional depth (Myllymäki et al., 2016; Ramsay and Silverman, 2006). Some other recent trends in point process theory, such as the development of models on non-Euclidean spaces including graphs (Anderes et al., 2020; Lieshout, 2018a), linear networks (Baddeley et al., 2021) or spheres (Møller and Rubak, 2016) so far seem to have had little impact on ecology, although determinantal point processes (Lavancier et al., 2015; Macchi, 2017) offer great potential for design-based sampling as well as numerical integration (Belhadji et al., 2019). There is a great variety of possible choices for the kernel.