Skip to main content
Log in

Spatial autocorrelation for massive spatial data: verification of efficiency and statistical power asymptotics

  • Original Article
  • Published:
Journal of Geographical Systems Aims and scope Submit manuscript

Abstract

Being a hot topic in recent years, many studies have been conducted with spatial data containing massive numbers of observations. Because initial developments for classical spatial autocorrelation statistics are based on rather small sample sizes, in the context of massive spatial datasets, this paper presents extensions to efficiency and statistical power comparisons between the Moran coefficient and the Geary ratio for different variable distribution assumptions and selected geographic neighborhood definitions. The question addressed asks whether or not earlier results for small n extend to large and massively large n, especially for non-normal variables; implications established are relevant to big spatial data. To achieve these comparisons, this paper summarizes proofs of limiting variances, also called asymptotic variances, to do the efficiency analysis, and derives the relationship function between the two statistics to compare their statistical power at the same scale. Visualization of this statistical power analysis employs an alternative technique that already appears in the literature, furnishing additional understanding and clarity about these spatial autocorrelation statistics. Results include: the Moran coefficient is more efficient than the Geary ratio for most surface partitionings, because this index has a relatively smaller asymptotic as well as exact variance, and the superior power of the Moran coefficient vis-à-vis the Geary ratio for positive spatial autocorrelation depends upon the type of geographic configuration, with this power approaching one as sample sizes become increasingly large. Because spatial analysts usually calculate these two statistics for interval/ration data, this paper also includes comments about the join count statistics used for nominal data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Refer to Univariate distribution relationships: http://www.math.wm.edu/~leemis/chart/UDR/UDR.html.

  2. This property also holds for the SR, CN-C, and CN-TR cases.

  3. The diagonal entries are zeros; i.e., \(c_{ii} = 0, i = 1,2, \ldots,n\).

  4. Given a random variable \(x\), for a two-sided test, \({\text{power}} = 1 - {\text{probability}}\left( {x < {\text{right}}\;{\text{critical }}\;{\text{value}}} \right) + {\text{probability}}\left( {x > {\text{left}}\;{\text{critical}}\;{\text{value}}} \right)\); for a right-sided test, \({\text{power}} = 1 - {\text{probability}}\left( {x < {\text{critical}}\; {\text{value}}} \right)\), whereas for a left-sided test, \({\text{power}} = {\text{probability}}\left( {x > {\text{critical}}\;{\text{value}}} \right)\).

  5. These data come from Griffith (2015); an initial 130 appeared in Griffith (2004), which was expanded to 144 in Griffith and Luhanga (2011).

  6. Following the steps in the mentioned paper, the statistical power of the MC is assessed by replacing all 1.96 values with 1.645 and retaining only the right-hand side of the standardized normal curve. Meanwhile, for the GR, because positive SA is in the interval [0, 1), the one-tailed test is the left-hand side rather than the right-hand side of the standardized normal curve; − 1.96 should be replaced with − 1.645, and the positive portions removed.

References

Download references

Acknowledgements

Funding was provided by The National Key Research and Development Program of China (Grant No. 2017YFB0503802) and China Scholarship Council (Grant No. 201406270075).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huayi Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Selected eigenvalues of binary connectivity matrices and corresponding MC and GR values for three theoretical configurations

See Table 6.

Table 6 First and last ten values of \(\varvec{\lambda}\), MC, and GR

Appendix 2: A descriptive introduction of statistical power

Figure 11 shows necessary elements of a hypothesis testing procedure. Suppose one is testing the null hypothesis mean = 0 whose underlying distribution is standard normal, setting the significance level \(\alpha\) to 0.05, which results in the critical values ± 1.96. Suppose the true mean value is one, which is the alternative hypothesis. The two green areas are critical regions in which the null hypothesis will be rejected; thus, the interval [− 1.96, 1.96] is the range across which the null will not be rejected. Because the true mean is one, failing to reject the null commits a Type II error, which is the area colored blue under the alternative distribution curve (the blue normal curve). Therefore, the statistical power of this hypothesis testing example is the areas under the blue curve that are restricted to \(\left[ {1.96,\left. { + \infty } \right)} \right.\) and \(\left( { - \infty ,\left. { - 1.96} \right]} \right.\).

Fig. 11
figure 11

An example of hypothesis testing

Appendix 3: Proofs for the relationship function between the MC and the GR and Theorems 1 to 4

Proof 1

Substituting Eq. (1) into Eq. (3) yields

$$\frac{{\left( {n - 1} \right)\left[ {2\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} \left( {\mathop \sum \nolimits_{j = 1}^{n} c_{ij} } \right) - 2\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)\left( {x_{j} - \bar{x}} \right)} \right]}}{{2\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} \mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} }}.$$

Comparing this equation to Eq. (2), the proof requires only showing the equality of their numerators. Considering \(\left( {x_{i} - x_{j} } \right)^{2} = \left[ {\left( {x_{i} - \bar{x}} \right) - \left( {x_{j} - \bar{x}} \right)} \right]^{2}\), and utilizing the symmetry of matrix \(\varvec{C}\), yields

$$\begin{aligned} & \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{i} - x_{j} } \right)^{2} \\ & \quad = \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)^{2} - 2\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)\left( {x_{j} - \bar{x}} \right) + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{j} - \bar{x}} \right)^{2} \\ & \quad = 2\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)^{2} - 2\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)\left( {x_{j} - \bar{x}} \right) \\ & \quad = 2\mathop \sum \limits_{i = 1}^{n} \left( {c_{i1} + c_{i2} + \cdots + c_{in} } \right)\left( {x_{i} - \bar{x}} \right)^{2} - 2\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)\left( {x_{j} - \bar{x}} \right) \\ & \quad = 2\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} \left( {\mathop \sum \limits_{j = 1}^{n} c_{ij} } \right) - 2\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)\left( {x_{j} - \bar{x}} \right). \\ \end{aligned}$$

\(\therefore GR =\) Eq. (3).□

The following are proofs for Theorems 1 to 4 (T1 to T4).

Proof of T1

$$\begin{aligned} & \mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{N} \left( {\text{MC}} \right) \\ & \quad = \mathop {\lim }\limits_{n \to \infty } \frac{{n^{2} \left( {n - 1} \right)S_{1} - n\left( {n - 1} \right)S_{2} + 3\left( {n - 1} \right)S_{0}^{2} - \left( {n + 1} \right)S_{0}^{2} }}{{\left( {n - 1} \right)^{2} \left( {n + 1} \right)S_{0}^{2} }} \\ & \quad = \mathop {\lim }\limits_{n \to \infty } \left[ {\frac{{n^{2} S_{1} }}{{\left( {n^{2} - 1} \right)S_{0}^{2} }} - \frac{{nS_{2} }}{{\left( {n^{2} - 1} \right)S_{0}^{2} }} + \frac{{2\left( {n - 2} \right)}}{{\left( {n - 1} \right)^{2} \left( {n + 1} \right)}}} \right] \\ & \quad = \frac{{S_{1} }}{{S_{0}^{2} }} - o\left( 1 \right)\frac{{S_{2} }}{{S_{0}^{2} }} + 2o\left( {\frac{1}{n}} \right) = \frac{2}{{S_{0} }} = {\text{Var}}_{A} \left( {\text{MC}} \right), \\ \end{aligned}$$

where \(o\left( 1 \right) = 1/n\) is an infinitesimal over \(n \to \infty\), \(S_{2} /S_{0}^{2}\) is a constant (it is a positive constant for the maximum planar connectivity case; otherwise, it converges to zero), and \(o\left( {1/n} \right) = 1/n^{2}\) is the infinitesimal of higher order than \(1/n\) over \(n \to \infty\).□

Proof of T2

$$\begin{aligned} & \quad \mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{R} \left( {\text{MC}} \right) \\ & \quad = \mathop {\lim }\limits_{n \to \infty } \left\{ {\frac{{n\left( {n - 1} \right)\left[ {\left( {n^{2} - 3n + 3} \right)S_{1} - nS_{2} + 3S_{0}^{2} } \right] - b_{2} \left( {n - 1} \right)\left[ {\left( {n^{2} - n} \right)S_{1} - 2nS_{2} + 6S_{0}^{2} } \right]}}{{\left( {n - 1} \right)^{2} \left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} - \frac{{\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }}{{\left( {n - 1} \right)^{2} \left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }}} \right\} \\ & \quad = \mathop {\lim }\limits_{n \to \infty } \left\{ {\frac{{n\left( {n^{2} - 3n + 3} \right)S_{1} }}{{\left( {n - 1} \right)\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} - \frac{{n^{2} S_{2} }}{{\left( {n - 1} \right)\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} + \frac{3n}{{\left( {n - 1} \right)\left( {n - 2} \right)\left( {n - 3} \right)}} - b_{2} \left[ {\frac{{nS_{1} }}{{\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} - \frac{{2nS_{2} }}{{\left( {n - 1} \right)\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} + \frac{6}{{\left( {n - 1} \right)\left( {n - 2} \right)\left( {n - 3} \right)}}} \right] - \frac{1}{{\left( {n - 1} \right)^{2} }}} \right\} \\ & \quad = \frac{{S_{1} }}{{S_{0}^{2} }} - o\left( 1 \right)\frac{{S_{2} }}{{S_{0}^{2} }} + 3o\left( {\frac{1}{n}} \right) - b_{2} \left[ {o\left( 1 \right)\frac{{S_{1} }}{{S_{0}^{2} }} - 2o\left( {\frac{1}{n}} \right)\frac{{S_{2} }}{{S_{0}^{2} }} + 6o\left( {\frac{1}{{n^{2} }}} \right)} \right] - o\left( {\frac{1}{n}} \right) \\ & \quad = \frac{{S_{1} }}{{S_{0}^{2} }} = \frac{2}{{S_{0} }} = {\text{Var}}_{A} \left( {\text{MC}} \right), \\ \end{aligned}$$

where \(b_{2}\) is a constant (an index of kurtosis) whose value may vary with the assumed distribution, and \(o\left( {1/n^{i} } \right)\)(\(i = 0,1,2\)) are infinitesimals (of higher order) over \(n \to \infty\).□

Proof of T3

$$\begin{aligned} & \quad \mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{N} \left( {\text{GR}} \right) \\ & \quad = \mathop {\lim }\limits_{n \to \infty } \left[ {\frac{{\left( {2S_{1} + S_{2} } \right)\left( {n - 1} \right)}}{{2\left( {n + 1} \right)S_{0}^{2} }} - \frac{2}{{\left( {n + 1} \right)}}} \right] \\ & \quad = \frac{{\left( {2S_{1} + S_{2} } \right)}}{{2S_{0}^{2} }} - 2o\left( 1 \right) = \frac{2}{{S_{0} }} + \frac{{S_{2} }}{{2S_{0}^{2} }}. \\ \end{aligned}$$

\(\therefore \mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{N} \left( {\text{GR}} \right) = {\text{Var}}_{A} \left( {\text{GR}} \right)\)

Proof of T4

$$\begin{aligned} & \mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{R} \left( {\text{GR}} \right) \\ & \quad = \mathop {\lim }\limits_{n \to \infty } \left\{ {\frac{{\left( {n - 1} \right)S_{1} \left[ {n^{2} - 3n + 3 - \left( {n - 1} \right)b_{2} } \right] - \frac{1}{4}\left( {n - 1} \right)S_{2} \left[ {n^{2} + 3n - 6 - \left( {n^{2} - n + 2} \right)b_{2} } \right]}}{{n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} + \frac{{S_{0}^{2} \left[ {n^{2} - 3 - \left( {n - 1} \right)^{2} b_{2} } \right]}}{{n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }}} \right\} \\ & \quad = \mathop {\lim }\limits_{n \to \infty } \left[ {\frac{{\left( {n - 1} \right)\left( {n^{2} - 3n + 3} \right)S_{1} }}{{n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} - \frac{{\left( {n - 1} \right)^{2} S_{1} b_{2} }}{{n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} - \frac{{\left( {n - 1} \right)\left( {n^{2} + 3n - 6} \right)S_{2} }}{{4n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} + \frac{{\left( {n - 1} \right)\left( {n^{2} - n + 2} \right)S_{2} b_{2} }}{{4n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} + \frac{{n^{2} - 3}}{{n\left( {n - 2} \right)\left( {n - 3} \right)}} - \frac{{\left( {n - 1} \right)^{2} b_{2} }}{{n\left( {n - 2} \right)\left( {n - 3} \right)}}} \right] \\ & \quad = \frac{{S_{1} }}{{S_{0}^{2} }} - o\left( 1 \right)\frac{{S_{1} }}{{S_{0}^{2} }}b_{2} - \frac{{S_{2} }}{{4S_{0}^{2} }} + \frac{{S_{2} b_{2} }}{{4S_{0}^{2} }} + o\left( 1 \right) - o\left( {\frac{1}{n}} \right)b_{2} \\ & \quad = \frac{2}{{S_{0} }} + \frac{{S_{2} \left( {b_{2} - 1} \right)}}{{4S_{0}^{2} }}. \\ \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, Q., Griffith, D.A. & Wu, H. Spatial autocorrelation for massive spatial data: verification of efficiency and statistical power asymptotics. J Geogr Syst 21, 237–269 (2019). https://doi.org/10.1007/s10109-019-00293-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10109-019-00293-3

Keywords

JEL Classification

Navigation