Introduction

League tables are rather commonplace nowadays. It is unsurprising that they gain such substantial media attention, because they convey their message clearly and simply, and because their format has a wide familiarity (founded on the league tables associated with sporting competitions). Area-based league tables serve to paint a picture. In so doing, they raise the question of whether that picture captures the essence of reality, or is simply the equivalent of shining a rather unusual type of light. If a body of analysis creates a set of impressions of local areas which is not invariant to the choice of spatial unit, that fact must be a matter of concern in principle. There is also the issue of who is supposed to be enlightened by area-based league tables: although policy-makers from local and central government are one obvious audience, so too are employers who are considering the location of their organisation and individuals choosing where to live. Beyond this, there is the question of informing the general public. Of course, league tables based on a particular strand of performance may distort policy, or other decisions: whatever is directly measured in a league table might become the key focus, potentially to the detriment of other strands of performance of equal, or greater, importance. However, we must not ignore the fact that, in as much as an absence of league tables may also reduce available information, that absence may be even more damaging to the prospects for sound (evidence-based) decision-making.

In this paper, our spotlight falls especially on the question of how ‘local areas’ are to be defined: this will determine ‘league membership’ for local economic performance league tables. Our example is drawn from the case of England. However, in principle, similar analysis should also be undertaken for other countries. Clearly, any country can be divided up in many different ways. The basis might be administrative or political boundaries, which themselves often result from an accumulation of historical factors (including political expediency or administrative convenience). However, these spatial units – e.g., Local Authority (LA) areas in England – can have the serious disadvantage of being highly heterogeneous in size (whether that is defined by area, population or some combination of the two). Functional economic areas are another possibility (see, for example, Cörvers et al. (2009)), but these are likely to vary too, according to which policy domain is being considered. Although the definition of such spatial units is likely to be criterion-based, there may also seem to be an arbitrary element in setting the precise terms of the criteria. In terms of outcome, we need to consider whether decisions made solely on the basis of an LA ranking are likely to be optimal. It is quite possible that such decisions might have differed substantially, had they been based instead on rankings for another sort of spatial unit.

The first of this paper’s two principal concerns is to develop the methodological approach proposed, and initially applied, in the work of Nolan et al. (2012). The other key element is to apply the approach to the data for a pair of specific domains from within the 2010 English Index of Multiple Deprivation (IMD2010) – rather than focusing on the overall index, as the earlier work did. This paper will provide initial empirical evidence on the extent to which local area ‘league table’ orderings exhibit similarities across two different – but related – policy domains. Commentary should then be possible about whether there is any substantial impact from defining spatial units in different ways – be they administrative, functional or size-based.

McLennan et al. (2011) provide a detailed review of IMD2010, which includes important discussion about technical issues associated with constructing a multiple-domain index across small geographic areas. Some of that discussion reflects a background literature which developed as successive versions of the IMD were produced (for 2000 and 2004), and their features and merits were debated – see, for example, Deas et al. (2003) and Noble et al. (2006). Changes in IMD construction through time, as well as evolving boundaries for some administrative local areas, lead to difficulties in examining how deprivation changes and where it persists – see, for example, Ajebon and Norman (2016). Indeed, a further English Index of Deprivation was developed to enable deprivation to be examined for the period 1999–2005, as analysed in Rae (2012). Meanwhile, there is also a literature on automated zoning procedures – modifying a zoning system to adapt (on the basis of specified criteria) to changes in underpinning area characteristics that occur inevitably with the passage of time. Martin (2003) and Cockings et al. (2011) are examples, and the latter considers threshold sizes for LSOA (Lower-layer Super Output Area) population as the distribution of residential location changes over the years. Geographical differences are examined by Burke and Jones (2018), who explore the development of an index (again at the LSOA level) to reflect the particular types of deprivation experienced in rural areas.

The next section of this article looks back briefly at underlying literature on the topics of spatial resolution, aggregation and the areal unit. It then moves on with an outline of our approach to spatial unit definition. This is followed by a section which starts by briefly outlining two domains from IMD2010 – employment being the first, and education, skills and training being the other. It then proceeds to portray extracts from the league tables that emerge from the application of our approach. After some appropriate discussion, there is a brief conclusion.

Spatial Resolution, Hypotheses and Spatial Unit Definition

As mentioned in Nolan et al. (2012), the Modifiable Areal Unit Problem (MAUP) was identified by Openshaw (1984), who noted that areal (spatial) units may be defined arbitrarily, and maybe in accordance with the idiosyncratic views of an individual undertaking a geographic study. More specifically, Huby et al. (2009) found that an increase in the size of base spatial unit will generate a reduction in inequality. The relevance of this finding to the current paper is substantially bolstered by the fact that these earlier authors were considering the aggregation LSOAs – which are the same baseline spatial unit that we will be using. Looking at Montreal, Séguin et al. (2012) found greater concentration of poverty for the smallest areal unit type – the dissemination area. The work of Jones (2000), which provides a way to estimate the unemployment rate at a spatially disaggregated level beyond the scope of the data then available, is also potentially of particular interest in the context of the employment domain of IMD2010 we consider here. Commuting patterns offer a ready explanation of the divergence of his unemployment rates, for Cardiff, from their published counterparts. However, it should be noted that the IMD2010 employment domain is rather broader – taking in substantial numbers of individuals that would be defined as economically inactive, in addition to the (claimant) unemployed.

Given the MAUP, it seems reasonable to put forward an initial outline null hypothesis (H01) that changes in the resolution and construction criteria of the spatial units used (typically below LA area resolution) should make no fundamental difference to the composition of a league table for a deprivation index or measure of local economic performance. A supplementary null hypothesis (H02) might be that changes in spatial unit resolution and construction criteria should not systematically influence regression results, for a given underlying specification. A final null hypothesis (H03) could be that the impact of spatial unit resolution and construction criteria changes are similar for two distinct deprivation index domains. Findings with regard to each of these hypotheses offer potentially valuable insights – for academics, local policy-makers, employers considering locational decisions, or individuals making choices about where to work and/or reside – in seeking to interpret league tables released in the media, and regression-based results from research articles.

Our approach to the definition of spatial units is based heavily on the concentric analysis laid out in Nolan et al. (2012). However, in this paper (in contrast to its predecessor), we utilise the definitions of LA areas in force after the April 2009 boundary changes (which cut the number of LA areas in England from 354 to 326). The first step of the process of specifying spatial units is to define the centre of each LA area. Each of England’s 32,482 LSOAs is defined as being centred at its population-weighted centroid (as available via the Office for National Statistics (ONS) Geoportal web pages). In a modification to the next step, the LA centre is given as the population-weighted meanFootnote 1 of all the LSOA centres within a given LA. Once the LA centre has been determined, the geometric distance can be calculated between it and any LSOA centre. For any given LA area, all of England’s LSOAs can be placed in ascending order of distance from that particular LA centre. The closest LSOAs to a particular LA centre will almost always be interior to that LA, while the furthest will be exterior to it. In between, the transition from interior LSOAs to exterior LSOAs may be smooth in some cases, but not others. New spatial units can be defined and generated through the systematic addition of LSOAs according to their ranking on distance between LSOA centre and the chosen LA centre. Later, we consider spatial units constructed with reference to cut-offs at integer numbers of kilometres of distance from the LA centre. Although this gives an underlying basis of a series of concentric circles, actual spatial units will be irregularly shaped. Moreover, as distance from LA centre is allowed to increase, the newly created spatial units will increasingly overlap.

Figure 1 below shows LSOAs for the case of Kingston upon Hull, for distances of 3 km (solid blue), 6 km (horizontal stripes), 12 km (solid green) and 24 km (diagonal stripes) from the LA centre. All of the LSOAs that are within 3 km of the LA centre are within the Hull LA area, but that is not true of some of the LSOAs within 6 km of the Hull LA centre, most of the LSOAs in the 6–12 km range and all of those in the 12–24 km range.

Fig. 1
figure 1

Concentric banding (for 3 km, 6 km, 12 km and 24 km), for Kingston upon Hull

This paper also investigates two further, related, approaches: they are novel and we name them Procedure 1 and Procedure 2. Both of them split each LA area into newly-defined spatial units of what we will term ‘similar population’. First, a benchmark (Y) is chosen for this ‘similar population’ (in thousands) – for example, Y = 50. If the LA area i has a population of Xi, then it is split into ki parts, where we define ki = max(1, Xi / Y) with the result rounded to the nearest integer. Once ki has been determined, the construction of sets of newly-defined spatial units can commence. As before, newly-created spatial units are formed from the aggregation of LSOAs, making use of the centre of each LSOA and the centre of each LA – in each case, using population weighted centroids. However, under Procedures 1 and 2, no given spatial unit is permitted to cross the boundary of a single LA area (so that a particular set of newly-created spatial units has no overlaps).

Under Procedure 1, LSOAs are added to a newly-defined spatial unit, in ascending order of distance of LSOA centre from the centre of the LA area, until the threshold population (Xi / ki) is first reached or breached. Once that first spatial unit has been established, a second spatial unit is constructed – again by proceeding through LSOAs in ascending order of distance from the LA centre, this time until the threshold (2Xi/ ki) is reached or breached. The process then continues, being undertaken ki times for that LA; and it is repeated for each and every LA area in England. The example shown in Fig. 2, for Kingston upon Hull (163 LSOAs, with population 261,098), splits the LA area (using Y = 50) into five parts (labelled A1-E1) on the basis of distance from the LA population weighted centroid.Footnote 2

Fig. 2
figure 2

Procedure 1, with Y = 50, using Kingston upon Hull as an example

Procedure 2 proceeds in a similar fashion to Procedure 1, with the one distinction being that (under Procedure 2) LSOAs are added in ascending order of the angle of the LSOA centre from the centre of the LA area. We define the angles in such a way that the 0° / 360° boundary is placed where an LSOA centre is directly due west from the centre of the LA area.Footnote 3 Another important point to note, for both Procedure 1 and Procedure 2, is that the same process can be replicated for any chosen benchmark ‘similar population’ (Y) – provided that Y is not smaller than the population size of at least the vast majority of the LSOAs. Of course, the number of newly-defined spatial units created will depend (negatively) upon the chosen value of Y. The example shown below in Fig. 3, for Kingston upon Hull, splits the LA area (using Y = 50) into five parts (labelled A2-E2) on the basis of angle of LSOA population weighted centroids from the LA population weighted centroid.Footnote 4 As Figs. 2 and 3 illustrate, Procedure 1 has the advantage of each part (e.g. the five areas marked C1 in Fig. 2) being roughly equidistant from the LA centre, but the disadvantage that they may well not be contiguous. Procedure 2’s angles-based approach much more readily achieves contiguity (see area C2 in Fig. 3) but at the cost of sacrificing equal raw distance. Thus, neither procedure obviously dominates the other. Many articles which analyse the IMD (or its components) focus on the grouping of socio-economically similar spatial units, regardless of whether they are contiguous – or, indeed, even spatially close to each other.Footnote 5 However, a key intended purpose of this research is to inform policy regarding, and perceptions of, local area economic performance – where the contiguity and proximity of spatial units (of population well above the LSOA norm) is more likely to be of practical relevance.

Fig. 3
figure 3

Procedure 2, with Y = 50, using Kingston upon Hull as an example

English IMD2010 Employment and Education Domains

The details of the Index of Multiple Deprivation 2010 (IMD2010) for England are laid out in DCLG (2011). As stated there, the main focus of analysis of the IMD2010 – to avoid the risk of loss of information – should be on England’s 32,482 LSOAs. These were designed to have similar populations of around 1500, and they nest within LA areas. The Appendix reveals an average LSOA population (in 2008) of 1584, while 95% of LSOAs range in population from 1165 to 2281. Splitting England’s land area of around 13 million hectares, the LSOA average area (Table 16 in the Appendix) is 401 ha, but 85% of LSOAs are smaller than that (and half of those are less than 40 ha) due to substantial positive skewness. Although the overall IMD2010 is a weighted amalgamation of seven policy domains, our focus in this paper concerns just two of those domains.Footnote 6 It is perhaps worth noting that the first, employment deprivation, is given a 22.5% weighting in the overall index (equal top billing, along with income deprivation); whereas the other (education, skills and training deprivation) is offered a more limited weighting of 13.5%. For each of these two policy domains, the domain score is itself the result of the aggregation of several elements. For example, the employment domain score includes not only working-age Job Seekers Allowance (JSA) claimants (a fairly narrow definition of unemployment); but also Incapacity Benefit (IB) claimants and Severe Disablement Allowance (SDA) claimants of working age (and both of these groupings include substantial components of economic inactivity, even if they are partly argued to include elements of ‘hidden’ unemployment). There are also four other components in the employment score – all based on working-age population (18–64 for men and 18–59 for women in that era) – details of these being shown on pages 23–24 of DCLG (2011). In other words, the employment domain score is defined – in principle – as the total number of employment-deprived persons divided by the total working-age population.Footnote 7

The education, training and skills domain is split into a pair of sub-domains: one is focused on children and young people, while the other relates to adult skills. The first sub-domain includes six elements: the first three capture attainment at each of the Key Stages 2–4 (for pupils aged 11, 14 and 16, and importantly based on LSOA of pupil residence, rather than educational establishment), while the others relate to absenteeism from school, entry to Further Education and entry to Higher Education. The weightings of the elements were determined by factor analysis, as indicated on page 37 of DCLG (2011). The second sub-domain looks at the proportion of low-skilled prime age (25–54) adults in Census 2001 (see ONS (2001)). In stark contrast to the case of the employment domain, even some of the different elements in the first sub-domain are fractional rates that use different denominators (for example, populations for different particular age ranges). This makes a straightforward detailed interpretation of the magnitude of the education domain score problematicFootnote 8 (within the IMD2010 output, it is reported on a 0–100 scale, and we also follow that approach).

Employment Deprivation Domain

Initially (in Table 1), we list the 30 most deprived LA areas in England for the employment domain, based on weighted average of the LSOA employment scores (proportions) – the weightings being based on LSOA working-age population. Of these LA areas, 17 would be in the corresponding league table for the overall deprivation IMD2010 index scores.

Table 1 Thirty most employment-deprived LA areas (with rank, and a weighted score (as a %))

By contrast, IMD2010 employment scale values are reported for LA areas in DCLG (2011); they are higher (ceteris paribus) for areas of greater population, because employment scale measures the raw number of employment-deprived individuals (just over three million across England, out of a working age population of a little over thirty million). Table 1 is an incomplete league table, but it also includes – for comparison – the LA areas at the approximate quartiles (82 and 245) and median (164) of the employment deprivation score distribution, and the least employment-deprived LA (326).

Beyond a raw league table, we need to explore the extent to which some key explanatory variables can account for variation in LA area employment scores. Table 2 below shows results for a basic linear regression of the natural logarithm of LA employment scoreFootnote 9 on the natural logarithm of LA land area (in hectares), the natural logarithm of 2008 mid-year LA population and a set of eight region dummies (London is the base region). A few basic summary statistics for the LA areas can be seen at the start of Table 16 in the Appendix.

Table 2 Natural logarithm of LA employment score, regression results

Since the linear regression is a double-log specification for the land area and population regressors, it should be noted that the first two estimated slope coefficients can be interpreted as elasticities. Of England’s 326 LA areas, 33 are in London, and the sample mean of the dependent variable for those cases is 2.177. This places London less favourably (i.e., more employment deprived) than the East Midlands, East (Anglia), the South East and the South West – whereas only the South East has a negative regression estimate, and that is by a narrow magnitude. Of course, the regression results control for land area and population – and London LAs average six times less land area than those in any other region and higher population than any other region except West Midlands (lower land area and higher population are both associated with higher employment deprivation).

The above regression could be set up equivalently (using Y1 for LA employment score and X1 for LA land area) with the natural log of LA population density (ln(X3)) in place of the log of LA population (ln(X2)). Then we can write ln(X3) = ln(X2/X1) = ln(X2) – ln(X1), which is equivalent to ln(X2) = ln(X1) + ln(X3), and to ln(X1) = ln(X2) – ln(X3). With predicted ln(Y1) denoted as \( {\widehat{Z}}_1, \) there are three alternative valid expressions that can be written for the London region:

  1. A.

    Using Table 2, \( {\widehat{Z}}_1=0.0350-0.1119\ln \left({X}_1\right)+0.2496\ln \left({X}_2\right) \).

  2. B.

    \( {\widehat{Z}}_1=0.0350+0.1377\ln \left({X}_1\right)+0.2496\ln \left({X}_3\right) \).

  3. C.

    \( {\widehat{Z}}_1=0.0350+0.1377\ln \left({X}_2\right)+0.1119\ln \left({X}_3\right) \).

All of the estimates attached to ln(X1), ln(X2) and ln(X3) in these three equivalent expressions are statistically significant at the 1% significance level. The impact of LA land area on employment deprivation score is negative (−0.1119) having controlled for population as well as region, but positive (+0.1377) after controlling instead for population density.

More urban areas typically have lower land area and higher population. They therefore tend to have higher population density. For the LA data, the respective bivariate correlations of ln(X1), ln(X2) and ln(X3) with ln(Y1) are −0.224, 0.403 and 0.374: this confirms that LA employment deprivation scores tend to be higher in more urban areas. In turn, that raises the issue of whether LA-based league tables of employment deprivation score – ranging across both urban and rural LA areas – can be expected to naturally favour less densely populated LA areas. Going beyond that, there is the issue of the extent to which a particular LA’s employment deprivation score can be considered to be a natural consequence of the relevant set of its underlying attributes, rather than underperformance of the local economy. Benchmarking actual scores, using predicted values from an appropriate regression, might be an interesting approach. Although Table 2 uses a very basic regression which is too straightforward to be likely to be genuinely appropriate, it is interesting to consider the ‘benchmarked’ league table it generates, based on land area, population and region (see Table 17 in the Appendix). There, only 12 of the LA areas seen in Table 1 now occupy top 30 positions (for having most employment deprivation over and above their ‘benchmark level’).

Some further basic spatial analysis of LA area employment deprivation scores can be undertaken by using Moran’s I, Geary’s c and the Getis and Ord statistic. Moran’s I was introduced by Moran (1950) to measure global spatial autocorrelation, and – for the case of 326 LA areas – it can be defined as followsFootnote 10:

$$ I=\frac{326}{\sum_i^{326}{\sum}_j^{326}{w}_{ij}}\frac{\sum_i^{326}{\sum}_j^{326}{w}_{ij}\left({Y}_{1i}-\overline{Y_1}\right)\left({Y}_{1j}-\overline{Y_1}\right)}{\sum_i^{326}{\left({Y}_{1i}-\overline{Y_1}\right)}^2}, $$
(1)

where the weights wij can be given by the reciprocal of the distance (in kilometres) between the centre of LAi and the centre of LAj, and those weights may then be ‘row standardised’ to sum to unity; or they can be defined in a binary format to be one for ‘near neighbour LA areas’ with a centre within a certain threshold distance and zero for other LA areas, beyond that threshold distance. The value of I itself lies between minus one and plus one. For LA area employment deprivation, Moran’s I takes the value 0.331 – that is, moderate positive spatial autocorrelation – with the weighting matrix standardised. With a binary weighting matrix, the value of Moran’s I depends on the cut-off point for 'near neighbours', but it is at least 0.245. The positive sign of Moran’s I (and statistical significance in both cases above its expected value (−1/(326–1))) indicates that, unsurprisingly, LA areas of similar employment deprivation tend to be found in spatial proximity to one another.

On the other hand, Geary’s c (see Geary (1954)) has more sensitivity to local spatial autocorrelation. For the case of 326 LA areas, it is given by the following expression:

$$ c=\frac{325}{2{\sum}_i^{326}{\sum}_j^{326}{w}_{ij}}\frac{\sum_i^{326}{\sum}_j^{326}{w}_{ij}{\left({Y}_{1i}-{Y}_{1j}\right)}^2}{\sum_i^{326}{\left({Y}_{1i}-\overline{Y_1}\right)}^2}. $$
(2)

This statistic lies between zero and two, with a value of one indicating no spatial autocorrelation. For LA area employment deprivation, Geary’s c is 0.672. Since it is less than unity (and statistically significantly so), positive spatial autocorrelation is again indicated.

The Getis and Ord statistic (see Getis and Ord (1992, 1995)) also investigates local spatial autocorrelation. For the case of 326 LA areas, it is given by the following expression:

$$ G(d)=\frac{\sum_i^{326}{\sum}_j^{326}{w}_{ij}(d){Y}_{1i}{Y}_{1j}}{\sum_i^{326}{\sum}_j^{326}{Y}_{1i}{Y}_{1j}}, $$
(3)

where i ≠ j and the weights are in binary format. The value of G(d) depends on the threshold distance, d, for 'near neighbours'. It must be compared against its mathematical expectation, which (for 326 LA areas) is given by:

$$ E\left[G(d)\right]=\frac{\sum_i^{326}{\sum}_j^{326}{w}_{ij}(d)}{325\ast 326}. $$

When we choose a large enough value of d (120 km) to ensure that even the Isles of Scilly LA has at least one 'near neighbour', the expected value of the Getis and Ord statistic is 0.305; G(120) = 0.280 for our data lies significantly beneath. This indicates ‘cold spot’ clustering of areas of low employment deprivation. However, when a smaller value of d (such as 40 km, still giving each LA area an average of nearly 20 'near neighbours') is used, E[G(40)] = 0.0605 and G(40) = 0.058: this offers no evidence of LA employment deprivation clustering.

LSOA-based employment deprivation scores inevitably exhibit much more variation than their LA counterparts (since England has nearly 100 times as many of these smaller spatial units, compared to the number of LA areas) – see Table 16 in the Appendix. Since LSOA scores are typically based on much smaller populations, they can also be expected to be less reliable on average than those for LA areas. However, we still undertake an LSOA-based log employment score regression corresponding to our earlier LA-based one – although without showing the coefficients on the region dummies in Table 3 (below). In addition, we test for the joint significance of a full set of (325) LA controls, in place of the eight region dummies. While LA area populations range from just over two thousand to a little beyond one million, LSOA populations are rather more similar to each other (from about five hundred and fifty to roughly twenty times that figure – see Table 16 in the AppendixFootnote 11). The most prominent difference between Table 3 and Table 2 relates to the estimated coefficient attached to log population: for the LSOA-based regression, this elasticity has a negative sign – opposite to the sign from the LA case. Such a disparity indicates that the way in which LSOAs are aggregated into LA areas is, in itself, important: more highly populated LSOAs are predicted to exhibit a lower rate of employment deprivation. This is initial evidence against our null hypothesis H02, that spatial resolution should not systematically influence regression results.

Table 3 Natural logarithm of LSOA employment score, two regression specifications

If the log of LSOA population density is inserted in place of the log of LSOA population, it has the same negative estimated elasticity attached as is reported in Table 3 for log population. With LA controls, the estimate attached to the ‘log of LSOA land area’ regressor is −0.2497 (= −0.1328 – 0.1168). The impact of LSOA land area on employment deprivation score is negative having controlled for population (−0.1168), and even more negative (−0.2497) after controlling instead for population density. The lower R2 value for the LSOA regression may bear out the point above about LSOA scores having lower reliability.Footnote 12

Calculation of Moran’s I, Geary’s c and the Getis and Ord statistic for the LSOA data is less straightforward in practical terms. This is because the ‘spatgsa’ procedure for the Stata software package does not work when the number of observations is so large. Fortunately, the mata matrix language offers an alternative approach (whilst still using the Stata software). Expressions [1] and [2] are readily implemented, but confirmation of statistical significance requires the accompanying variances – provided, for example, in the mathematical appendix of Sokal et al. (1998). The appropriate value of Moran’s I is 0.189, and that of Geary’s c is 0.829. Each is comprehensively statistically significant, and indicative of positive spatial autocorrelation. Meanwhile, expression [3] yields a value for the Getis and Ord statistic (G(90)) of 0.206. Using the variance from Getis and Ord (1992), that can be seen as statistically significant for ‘cold spot’ clustering of low employment deprivation.

Table 4 shows the 20 most deprived LA areas on the basis of the LSOAs centred within (respectively) 3 km, 6 km, 12 km and 24 km of their LA centre (as illustrated in Fig. 1 for the case of Kingston-upon-Hull).

It is worth remarking even on some rather obvious points. Firstly, the considerable majority of these employment-deprived LA centres can be found in northern England – with almost all the exceptions to that rule (Birmingham, Sandwell, Staffordshire Moorlands, Stoke-on-Trent, Walsall and Wolverhampton) being situated in the West Midlands. That just leaves Bolsover (East Midlands) and Great Yarmouth (East of England). Secondly, there are some LA centres for which employment deprivation persists across all these chosen radii: notably Hartlepool, Wirral and Knowsley; and, to a somewhat lesser extent, Liverpool and St. Helens. Unsurprisingly, all of the corresponding LA areas are in the top 20 most deprived, well within the bounds of Table 1. Thirdly, there is some evidence against our null hypothesis H01. Two LA centres appear only once in this table (of four rankings) – namely Bolsover and Durham County (newly created in the April 2009 boundary changes).Footnote 13 However, both of the corresponding LA areas appear in Table 1 – unlike several of the other names from Table 4. Two of the most notable cases are Wyre and Staffordshire Moorlands, since their LA centres are well within the top 20 for a 12 km radius, but their underlying LA areas lie outside the top 100 for employment deprivation. The other case worth highlighting is Bury, whose LA centre is named for the 12 km radius and the 24 km radius in Table 4, while having an underlying LA area which is only the 72nd most employment deprived. Finally, we can focus on the LA areas within Table 1 which are not to be found within Table 4: Hastings LA moves from being within the top 15 to being outside the top 25 based on radius from centre. Similarly, Salford, Kingston upon Hull, Manchester and Mansfield are each within the top 25 most employment-deprived as LA areas, but would never be higher than 30th (and usually outside the top 40) if Table 4 were to be extended.

Table 4 20 most employment-deprived LA centres at 3 km, 6 km, 12 km and 24 km radius

To give a somewhat clearer impression of the relationship between employment deprivation rankings based on LA areas, and each of the sets of rankings based on our concentric banding methodology (with cut-offs at integer numbers of kilometres from the LA centre), we can calculate Spearman’s rank correlation coefficient (rs). Values of rs are reported in Table 5 below, for each integer concentric banding radius (in km):

Table 5 Values of Spearman’s rank correlation coefficient (in each case, the LA area ranking is compared to a ranking for LA centres based on concentric banding at a particular radius)

Firstly, it should be noted that, for the 1–4 km radii, some LA centres have no LSOA centre within that distance – and thus no ranking to compare against the LA area ranking. For a given radius, we exclude such LA areas – and this may help to explain the initial increase in rs (as radius increases) – although it is also likely to be linked to the fact that more than 80% of actual LA areas in England are larger in size than a circle of radius 4 km. However, beyond a 6 km radius, rs begins to fall back – although it is still as high as 0.88 at a radius consistent with mean LA land area, and 0.778 for a radius of 24 km (the largest in Table 4). Thus, evidence against null hypothesis H01 here is rather modest.

The plot below (Fig. 4) illustrates employment deprivation for some LA centres highlighted above:

Fig. 4
figure 4

IMD2010 employment deprivation domain score for seven selected LA centres

The pronounced peak for Bolsover at a 5 km radius indicates why it appears only once in Table 4. A rather inexorable rise for Durham County leads to its entry into Table 4 at a 24 km radius.

Now using Procedure 1 instead, to split each LA area into subsets of its constituent LSOAs, we can obtain the two sets of results shown in Table 6, belowFootnote 14 – the upper half of the rows involve specifications which use region controls (eight dummies), whilst the lower half of the rows utilise LA controls (325 dummies). The key difference between the contents of Table 2 and Table 3 was with respect to the estimates attached to the population regressor. However, whilst the estimates attached to the population regressor in Table 6’s upper rows (region controls) share the positive sign and statistical significance seen initially in Table 2 (for the regression based on LA area data), those in Table 6’s lower rows (LA controls) are statistically insignificant at the 5% level until Y drops below 20. This offers some evidence against null hypothesis H02 as resolution is altered, for construction criterion Procedure 1. It is also worth considering our measures of spatial autocorrelation for these values of Y, and their statistical significance. The information is displayed in the first half of Table 19 in the Appendix, and shows analogous results to the earlier LA and LSOA cases.

Table 6 Regression estimates across Y values, dependent variable ln(LSOA employment score)

Using Procedure 2, we can obtain the two sets of results shown in Table 7, below (with the same region controls in the upper half, and LA controls for the lower half). The pattern of the estimates attached to the population regressor, across different values of Y, shows less fluctuation than under Procedure 1. With region controls, the estimate shrinks substantially only when Y is reduced to 7.5, but the picture is less clear when LA controls are included. However, the regression results differ for Procedures 1 and 2, and this offers some further evidence against null H02.

Table 7 Regression estimates across Y values, dependent variable ln(LSOA employment score)

Procedure 1 and Procedure 2 group together LSOAs according to quite different rules (which have quite different grouping outcomes). Thus, it is interesting that the respective first halves of Tables 6 and 7 are characterised chiefly by similarities. The similarity extends to the measures of spatial autocorrelation shown in the second half of Table 19 in the Appendix.

However, although regression results do not appear to be markedly different across the total of 35 cases captured across Tables 2, 3, 6 and 7, the character of league tables may be less similar. Recall that we have effectively created 16 separate new employment deprivation league tables: this does not include Table 1 (which shows the top 10% or so LA areas for employment deprivation) or the corresponding league table for LSOA employment deprivation. Beyond this, Table 4 offers four ‘top-20 most deprived’ league tables – one for each of four chosen radii, for potentially overlapping areas, each of which is centred at the population-weighted centroid of the base LA area. There is a separate (basically unreported) league table underlying each column of Table 6 and each column of Table 7. As an illustration, in the case where Y is 60, one part of the Oldham LA area appears within the top 1% of the 858 observations for employment deprivation score under Procedure 1, whilst one part of each of the Walsall and Bolton LA areas appears around the top 2%. For Procedure 2, those LA names are not evident, but Sefton (top 1%) and Bradford (top 2%) are. None of these LA areas appear in Table 1 (showing the top 10% or so LA areas for employment deprivation), and Bolton and Bradford do not appear in Table 4 either. This is further exemplar evidence against null H01, and we might conclude that those considering the employment deprivation (or otherwise) of Bolton and Bradford would be well advised to go beyond an examination of a basic LA area league table and beyond straightforward league tables based on concentric distance from LA centre. Instead, league tables from our two new procedures should be considered.

Education, Skills and Training Deprivation Domain

We turn now to a parallel analysis for the education domain. Initially (in Table 8), we list the 30 most deprived LA areas based on weighted average of the LSOA education scores – the weightings in this case being based on LSOA overall population. However, this choice is not clear-cut – especially given the fact that the two component education sub-domains are based on two different groups (respectively, those who have not reached adulthood, and those who have). Table 8 also names the LA areas that occupy the upper and lower quartiles of the education deprivation distribution, as well as the median, and the least education-deprived LA area.

Table 8 Thirty most education-deprived LA areas (with rank, and a weighted score)

Only 14 LA areas are common to the top 30 in Table 1 and Table 8. Although strong in principle, the link between LA area employment deprivation and education deprivation is limited in reality. This fact may be relevant background when our null hypothesis H03 is being considered later.

Results shown below in Table 9 are for a basic linear regression of the natural logarithm of LA education score on the natural logarithm of LA land area (in hectares), the natural logarithm of 2008 mid-year LA population and a set of eight region dummies. In this case, a simple ordering of the regional sample mean LA log education scores places London most favourably (i.e., with the least education deprivation, at 2.532) – in keeping with all other regions having positive regression estimates. Using Y2 as LA education score, and with predicted ln (Y2) denoted as \( {\widehat{Z}}_2 \)(with X1, X2 and X3 again defined (respectively) as LA land area, population and population density), there are three equivalents for the London region:

A. From Table 9, \( {\widehat{Z}}_2=0.2249-0.1308\ln \left({X}_1\right)+0.2757\ln \left({X}_2\right) \).

B. Equivalently, \( {\widehat{Z}}_2=0.2249+0.1449\ln \left({X}_1\right)+0.2757\ln \left({X}_3\right). \)

C. Alternatively, \( {\widehat{Z}}_2=0.2249+0.1449\ln \left({X}_2\right)+0.1308\ln \left({X}_3\right) \).

Table 9 Natural logarithm of LA education score, regression results

The estimates attached to ln(X1), ln(X2) and ln(X3) in the above three expressions are all statistically significant at the 1% significance level. The impact of LA land area on education deprivation score is negative (−0.1308) when population is controlled for, but positive (+0.1449) after controlling instead for population density. We can conclude that LA education deprivation scores also tend to be higher in more urban areas (which usually have lower land area, higher population and higher population density). For the LA data, the respective bivariate correlations of ln(X1), ln(X2) and ln(X3) with the log of education deprivation score are −0.082, 0.257 and 0.184: this suggests less education deprivation in the less urban LA areas. If so, LA-based league tables on education deprivation score might be expected to favour less densely populated LA areas, just like their counterparts for employment deprivation.

Using, as ‘benchmarks’, predicted values (for education deprivation score) generated from the basic regression that produced Table 9, a ‘benchmarked’ league table can be constructed (see Table 18 in the Appendix). This time, 16 of the LA areas seen in Table 8 occupy top 30 positions (for having most education deprivation over and above their ‘benchmark level’).

Basic spatial analysis of LA area education deprivation scores, analogous to that undertaken previously for the employment deprivation case, again indicates evidence of positive spatial autocorrelation. Moran’s I takes the values 0.236 (standardised weighting matrix) or 0.193 (binary), and the combination of a positive sign and strong statistical significance indicates positive global spatial autocorrelation. Geary’s c is 0.776, and being statistically significantly below unity, strongly indicates positive local spatial autocorrelation. For the Getis and Ord statistic, G(120) = 0.284 and G(40) = 0.051: given that these values lie significantly below their respective mathematical expectations (0.305 and 0.0605), the indication is that there is ‘cold spot’ clustering of areas of low education deprivation. In this instance, the result does not appear to depend critically on the threshold distance for 'near neighbours'.

An LSOA-based regression for log education deprivation scores corresponds to the LA-based regression with results in Table 9 above. In parallel to Table 3, Table 10’s LSOA-based results are shown initially for eight region controls, and then for a set of 325 LA controls.

Table 10 Natural logarithm of LSOA EDUCATION score, regression results

Comparison of LA and LSOA results (Tables 9 and 10) shows no change of sign for the estimate on the log of population (although it loses its significance for Table 10’s region controls specification), so there is no evidence here against null H02. A re-specification of the regression with log of LSOA population density in place of the log of LSOA population would yield a negative estimate attached to the separate ‘log of LSOA land area’ regressor for either set of controls. For LA controls, −0.0777 = −0.2144 – (−0.1367). The impact of LSOA land area on education deprivation score is negative: it remains so after controlling for population density, although such control shrinks the estimate, rather than expanding it – as was the case for employment deprivation. Thus, we have some initial evidence against null hypothesis H03. In this instance, Moran’s I is 0.138, and Geary’s c is 0.876. Once again, each is statistically significant, and demonstrates positive spatial autocorrelation. Meanwhile, the Getis and Ord statistic (G(90)) is 0.202, so education deprivation also displays statistically significant ‘cold spot’ clustering.

Taken together, Tables 1, 4 and 8 plus Table 11 (below) demonstrate that there is a rather different spatial mix of education deprivation, compared with employment deprivation. From the south, the London Borough of Barking and Dagenham appears in Table 8, whilst Basildon – which is less than 35 km from the edge of Greater London – can be seen in the first column of Table 11, along with Sedgemoor and Swale (contrast just Hastings in Table 1). There is also a change in the distribution of LA centres on the northern side of the north-south divide – with significantly more representation from the Yorkshire and The Humber region, and also from the East Midlands (whereas Bolsover was alone in Table 4).

Table 11 Top 20 most education-deprived LA areas at 3 km, 6 km, 12 km and 24 km radius

Based on the LSOAs centred within (respectively) 3 km, 6 km, 12 km and 24 km of their LA centre, Table 11 shows the 20 most deprived LA areas, for the education deprivation domain. It is also very striking that only two LA centres appear within the top 20 for education deprivation for all the four chosen radii, and they are Barnsley (Yorkshire and The Humber) and Walsall (West Midlands). Those that can be seen for three out of the four are Kingston upon Hull, King’s Lynn and West Norfolk, Ashfield, Dudley, Sandwell and South Staffordshire. Of these eight named LA areas, five appear within the top 15 of Table 8. However, the other three are some way outside the top 30 – specifically King’s Lynn and West Norfolk (58th), Dudley (60th) and South StaffordshireFootnote 15 (214th). This is further evidence against null H01. Since some parallel cases were found previously for employment deprivation, this might also be at least soft evidence that is broadly consistent with null H03.

As we did for employment deprivation, we now examine the relationship between education deprivation rankings based on LA areas, and each of the sets of rankings based on our concentric banding methodology (with cut-offs at integer numbers of kilometres from the LA centre), using Spearman’s rank correlation coefficient (rs). This is shown in Table 12 below.

Table 12 Values of Spearman’s rank correlation coefficient (in each case, the LA area ranking is compared to a ranking for LA centres based on concentric banding at a particular radius)

Again the initial increase in rs (as radius increases) is probably likely to be linked to the fact that more than 80% of actual LA areas in England are larger in size than a circle of radius 4 km. Just as was true for employment deprivation, rs begins to fall back beyond a 6 km radius. The values of rs at a given radius are always lower for education deprivation – although 0.695 for a radius of 24 km is still a substantial rank correlation. As with Table 5, the evidence here against null hypothesis H01 is rather modest. There is also at least an indication of evidence in line with null H03.

Figure 5 below illustrates education deprivation for some LA centres highlighted in the earlier discusion of Table 11.

Fig. 5
figure 5

IMD2010 education deprivation domain score for eight selected LA centres

Note that all these LA centres appear in the top 20 for education deprivation in at least three of our four ranking lists (see Table 11): it would be interesting to get a wider perspective by viewing some of these cases benchmarked against LA centres with much less education deprivation.

Using Procedure 1, we can obtain regression results for education deprivation. With LA controls (lower half of Table 13), the estimates on population are rather different from either the employment deprivation results, or Tables 9 and 10 for education deprivation – being negative for higher values of Y, and statistically significant at the 10% level for Y = 50 and Y = 60.

Table 13 Regression estimates across Y values, dependent variable ln(LSOA education score)

Thus, there is some evidence here against two null hypotheses – H02 (regarding spatial resolution) and H03. However, our measures of spatial autocorrelation – in the first half of Table 20 in the Appendix – yield similar conclusions to their counterparts from the employment deprivation case.

Using Procedure 2, we can obtain the set of results shown in Table 14, below.

Table 14 Regression estimates across Y values, dependent variable ln(LSOA education score)

With LA controls (lower half of Table 14), the pattern of the estimates attached to the population regressor, across different values of Y, this time exhibits greater volatility than (the trend) under Procedure 1. It is interesting that the relationship between the results in Tables 13 and 14 appears somewhat different to that seen previously for employment deprivation (in Tables 6 and 7). Thus, there is further evidence against null hypothesis H02 for spatial resolution and spatial unit construction criteria, and some evidence against null H03 across spatial unit construction criteria. The second half of Table 20 (in the Appendix) is, however, broadly similar to the second half of Table 19.

As with Tables 6 and 7, there is a separate (basically unreported) league table underlying each column of Table 13 and each column of Table 14. However, as before, the Y = 60 case (for example) offers evidence against null hypothesis H01, with some concentration of education deprivation in LA areas not shown in the raw LA-based league table (Table 8). In this instance, part of Southampton lies well within the top 1% of the 858 observations for Procedure 1 – while a part of Birmingham is in the top 2%, and a part of Bristol is just outside that range. For Procedure 2, a part of Leeds and a part of Bristol are well within the top 1%, with a part of Sheffield just outside and two parts of Birmingham within the top 2%. Of these, only Southampton and Bristol are also missing from radius-based Table 11. The impact of changes in spatial resolution, and of changes in spatial unit construction criteria, are quite similar to those for employment deprivation – evidence in line with null H03.

The Two Domains Compared

Another issue to consider is the strength of the correlation between employment deprivation ranking and education deprivation ranking. On the basis of England’s 326 LA areas themselves, Spearman’s rank correlation coefficient for this relationship is 0.8086. This compares with a corresponding value of 0.7898 through the 32,482 observations at LSOA level – and seems to be in keeping with the findings of Gehlke and Biehl (1934), who reported a tendency for the (Pearson) correlation coefficient to increase when (US) census tract data were grouped together in contiguous groupings.Footnote 16

Table 15 and Fig. 6 report and illustrate the rank correlations for employment deprivation and education deprivation based instead upon our concentric banding methodology.

Table 15 Values of Spearman’s rank correlation coefficient (in each case, the employment deprivation ranking is compared to education deprivation for LA centres based on concentric banding at a particular radius)
Fig. 6
figure 6

Spearman’s rank correlation coefficient, comparing employment deprivation and education deprivation rankings for LA centres, using concentric banding for a range of radii

Although, on the face of it, this remains a strong positive rank correlation throughout – as we might expect – it should be noted that rs exhibits several turning points within the first three columns (up to 30 km radius). This indicates that, for England, the relationship between the location of local employment deprivation and that of local education deprivation is not entirely straightforward. That much was apparent from observation of the different regional spreads for league tables of local deprivation shown above in Tables 1, 4, 8 and 11 – which revealed indications that null hypothesis H01 does not really hold, although there is little clear evidence to contradict null hypothesis H03.

Conclusions

In addition to comparing and contrasting regression estimates for administrative units such as LA areas and LSOAs, and for systematically constructed collections of LSOAs, it is worthwhile to produce and compare league tables (given their popularity for communication with the general public). We have demonstrated some rather large league table differences between which of England’s LA areas exhibit employment deprivation and which demonstrate deprivation for the education, skills and training domain. Concentrations of the former type of deprivation are particularly evident in northern England (although not much in Yorkshire and The Humber) and the West Midlands. On the other hand, education deprivation has a considerable focus in Yorkshire and The Humber, as well as the West Midlands; while there is also some spread into the East Midlands, the East of England and even Greater London.

Rather unsurprisingly, we find evidence against our first null hypothesis. League table positions, based on either of our measures of deprivation, can depend crucially on the size category of spatial unit being considered, and the criteria by which those spatial units are constructed. Traditional simple league table construction based solely on defined LA areas is unlikely to give a sufficiently clear picture. Precisely how far to investigate beyond the initial LA-based league table depends upon the circumstances. We have given examples, for employment deprivation in Bolton and Bradford, and for education deprivation in Bristol and Southampton, where a concentration of deprivation across an area with a population of the order of 60,000 people becomes evident for league tables based on our two procedures. With regard to league tables, this evidence is also in line with our final null hypothesis that the impact of spatial resolution and spatial unit construction criteria on different deprivation domains may be similar. From a practical perspective, fuller investigation offers an opportunity for a deeper understanding of the concentration of deprivation – and thus to target local economic assistance measures more efficiently where they are most urgently needed. Simultaneously, the identification of an area as being less deprived than it initially appears might alter perception favourably and change locational decisions regarding the siting of organisations or inward mobility for work or residence.

As basic regression analysis demonstrates, both employment and education deprivation scores are likely to be higher (more adverse) for more urban LA areas with higher population totals, higher population density and lower land area – but physically larger LA areas are more deprived, once population density has been controlled for. However, the picture is less clear when LSOA area data are subjected to similar analysis – with larger LSOA areas tending to be less deprived (and, in the case of employment deprivation, more so after controlling for population density). Additional regression analysis, performed after having introduced a couple of distinct procedures for splitting up LA areas into similarly populated subsets of their constituent LSOAs, yields somewhat mixed results. Overall, regression results are mildly dependent on area unit definitions, with some evidence of different outcomes from concentric and angular approaches, and also across our two deprivation measures. This constitutes at least limited evidence against our second and third null hypotheses. Meanwhile, our measures of spatial autocorrelation are broadly stable across both areal specification and the two deprivation domains.

Future work should consider other data, concerning local economic performance, that are available at similar levels of spatial disaggregation. Such analysis could be augmented by comparisons with results generated for such functional areas as Travel To Work Areas (and also Primary Urban Areas, if the precise definitions of these becomes publicly accessible – see DCLG (2010), and also Smith et al. (2010) for some discussion). There is also scope for further investigation of the links between rankings on education deprivation and employment deprivation, as well as other deprivation domains, and various measures of local economic performance. Ideally, similar analysis should also be attempted for other countries, since there is no guarantee that identical findings will be obtained in every nation’s case: it is of interest to discover whether variations are systematic and, if so, in what ways. As indicated by He et al. (2018), the exploration of changes in deprivation over time would need to keep in mind the impact of changes in the definition of administrative boundaries – in our case, some LSOAs were redefined in the 2011 Census.