1 Introduction

Cities are complex systems embedded in the physical space which process information, evolve and adapt to their environment [1]. To understand how complex systems – and cities more specifically – operate, it is thus important to quantify how information is processed in terms of integration and segregation. To this aim, on the one hand many relevant network descriptors have been introduced, based either on topological features or on dynamical ones, or both. On the other hand, integration has been reflected either in how information flow is accounted for by more complex topological models where multiple relationships co-exist simultaneously [25], namely multilayer systems [6, 7], or in causal effects observed in the time course of systems’ units [817].

Concerning the topological analysis of classical single-layer networks, to date a clear definition of integrated and segregated information flow is still debated and many proxies are used across a broad spectrum of disciplines, ranging from neuroscience to social and urban sciences [1833], often indicating with the same name very different concepts.

The recent availability of a large amount of human-generated data enables the analysis of urban systems from different perspectives which could not be even considered until a few years ago [34]. Consequently, models and analytical tools inspired by complexity science are proliferating. More and more examples are providing convincing evidences of their fruitful application to real cities [3540]. Applications range from human mobility [4144] and traffic congestion [4549], to energy consumption [50], air quality [51, 52] and climate [53], health and well being [5457], and the associated topic of accessibility to important facilities like hospitals [58]. Indeed, the city can be seen as a growing complex system [59, 60] whose spatial organisation [61, 62] dynamically experiences a transition from monocentric to polycentric [63, 64].

The relative ease of accessing large and detailed data sources describing at the same time the structure and the function of urban systems, puts them in the position of becoming a paradigmatically example over which we can identify the right methodologies allowing us to understand the behaviour of spatially embedded complex systems. A particularly relevant perspective is offered by activity-aware information [65], such as the one provided by users of Foursquare – a leading location intelligence platform – which allows people to investigate human flows at different scales and thus to reconstruct the functional network of cities with great level of detail [66] and to classify existing activities into a few representative macro-categories (see Methods for details).

In this work, we stratify those human activities to build the functional networks describing the human movements across the urban space of 10 different metropolitan systems spread over three continents. To gain novel insights about the functional organisation of the underlying urban ecosystem, we build a multilayer network [4, 7], where the flows encode how users move between venues of the same macro-category (e.g., from a pub to another one) and between venues of different macro-categories (e.g., from a pub to a cinema). In the following, we will refer to intra-layer flow to indicate movements of the first type, and to inter-layer flow to indicate movements of the second type.

Our main goal is to better characterise the functional organisation of a city through the lens of network science. To this aim we measure to which extent different areas of the city facilitate human flows – i.e., functional integration – and to which extent there are separate clusters of areas characterised by within-cluster flows larger than between-cluster flows – i.e., functional segregation – (see Methods for details) [67]. By considering those measures simultaneously, it is possible to characterise how well human flows mix through the city according to the existing distribution of venues and the way residents use them. In fact, the dichotomy between integration and segregation – often improperly used as antonyms – is relevant for improving our understanding of the interplay between the urban structure, social relationships and human behaviour.

At the same time, to investigate the coupling between the structure of a city and the dynamics of its inhabitants, we also study the integration and segregation of the structural networks of these cities reconstructed from Open Street Map [68]. See Fig. 1 and Methods for more details on the definition of the structural and functional networks.

Figure 1
figure 1

Modelling Structure and Function of Urban Systems. Left: Urban structural backbone of the 10 megacities considered here, as described from their street networks (data obtained from Open Street Map [68]). Middle: Urban functional networks described by the Foursquare data. The nodes are obtained by dividing the area analysed into cells of 500 m × 500 m. The edges are subsequent check-ins that might be between activities of the same type (intra-links: e.g. Food-Food, Tourism-Tourism) or different types (inter-links: e.g. Food-Tourism, Food-Sport). The collection of layers and inter-layer flows defines a multilayer network [4, 6, 7], i.e., a multidimensional functional representation of the urban areas. Right: The mobility flows between areas are captured as the edges’ weights. In the example, describing New York City, we can observe the different spatial distribution of flows between and across different activity layers (see also Fig. 5(a))

2 Results

2.1 Overview of the data sets

The Foursquare data made available for the Future Cities Challenge [69] describe 24 months of check-ins collected between April 2017 and March 2019 (included). The use of these dataset faces multiple limitations, discussed in details in the Methods section.

The 10 world mega-cities included in the challenge are Chicago, Istanbul, Jakarta, London, Los Angeles, Tokyo, Paris, Seoul, Singapore and New York City (represented as example in Fig. 1 right). The extensive characteristics of the datasets are shown in Table 1. The flows between different areas are derived by subsequent anonymised check-ins to the Foursquare’s location-based services and coarse grained with a 500 m × 500 m granularity (see Fig. 1 middle, and Methods). In the data provided, check-ins are already aggregated by couple of venues (origin and destination), month.

Table 1 Foursquare data set extensive characteristics. The figures here are aggregated for all layers and comprise all 24 months. The linear size L is here estimated as the square root of the total area covered by the data after the aggregation into squares of 500 m × 500 m. Please note that the value of population for the city of Paris here corresponds to the Grand Paris Metropolitan areas that is the territory roughly covered by the data. Other population correspond to the municipality area (or the national area for the case of Singapore)

The Open Street Map data has been obtained using the OSMNX python library [68] (see Fig. 1 left). The urban area selected has been set to matche the cells covered by the Foursquare venues. The structural network has been reduced to a lattice-like form of the same granularity as the urban flow, so that all nodes in the structural network find their correspondence in the functional network. Differently from the functional one, the structural network is purely topological, as an undirected link between two cells exists if at least one street connects the two areas.

2.2 Quantifying integration and segregation

As previously mentioned, we characterise the organisation of the city through measures of integration and segregation. To avoid confusion in the reader, it is worth remarking that our measures of integration and segregation are those established in the field of network neuroscience [28], rather than being associated to the traditional social concepts, and are thus not related to population or cultural mixing [70], but only to how cities are lived by their users. Integration quantifies, in terms of information exchange efficiency, the ability of a city to favour the flow of people across its areas, and is measured by means of the global communication efficiency GCE, specifically normalised to correctly compare the efficiencies of weighted and un-weighted networks [71]. Segregation, on the other hand, evaluates the strength of segregated communities, areas of the city with strong flows inside the area and weak inter-areas flows and is estimated as the maximal modularity \(Q^{\ast }\) [72] of the network (see Methods for further details).

2.3 Structural vs functional networks

Having identified two measures suitable for comparing different cities and types of networks, we begin our analysis by mapping the link between integration and segregation in both the structural road networks and the single layer flow networks, obtained aggregating for each city inter-layer and intra-layer flows over the whole temporal extension of the dataset, which describe the functional use of the city by individuals.

The results, displayed in panels (a) and (b) of Fig. 2, suggest that, in general, higher values of segregation are associated to lower values of integration, as common sense would suggest. However, we also observe clear deviations from this trend, the major one being the functional network for the city of Los Angeles appearing to be much more integrated than what would be expected by its relatively high level of segregation.

Figure 2
figure 2

Structural vs Functional organisation of cities measured by means of Segregation and Integration. (a) Structural Integration vs Segregation. Analysing the measures of segregation (\(Q^{\ast }\)) and Integration (GCE) for the topological un-directed network describing the road structure of cities we observe a very strong anti-correlation (Pearson \(r=-0.92\)). (b) Functional Integration vs Segregation. The same measures of the weighted network describing the mobility flows display clear deviations from the anti-correlation of integration and segregation, in particular for the city of Los Angeles. (c) Structural vs Functional Segregation. The measures of segregation for the two types of networks are strongly correlated (Pearson \(r=0.91\)) but differ in value. (d) Structural vs Functional Integration. The measures of integration for the two types of networks deviate from perfect correlation (again due to the deviation of Los Angeles) but are very similar in value. In all panels, the dimensions of the circle is proportional to the size of the area considered

Of particular interest is the comparison of structural and functional properties of the same systems (panels (c) and (d) of Fig. 2). The segregation, estimated through the lens of modularity, seems to systematically deviate, with the functional flow network being less segregated than the structural network even if the values for the different cities are highly correlated. The integration instead, studied with an indicator specifically developed for allowing this type of comparisons [71] corresponds also numerically for the very different structural and functional network, and this perfect correspondence reveals a divergence between structural and functional properties of the city of Los Angeles.

2.4 What determines integration and segregation

In order to understand what lies behind the pattern of anti-correlation between integration and segregation observed in Fig. 2, we generate spatially embedded networks that attempt at reproducing the key feature of the urban functional networks using two widely used null models: (i) the Watts-Strograts (WS) small world networks obtained through rewiring of a regular lattice; (ii) the Random Geometric Networks (RGN) obtained by linking two randomly placed points if their distance falls below a fixed threshold r (see Methods). Also for the RGNs we proceeded with random rewiring and, in both cases, the probability of rewiring is indicated by p.

In Fig. 3 we observe that for both null models we reproduce the same anti-correlation pattern observed for real networks, but also see that rewiring is strongly reducing segregation and increasing integration in a way that breaks the linear relationship between the two quantities. Moreover, since by generating them we can control all features of the WS and RGN networks considered, we are able to isolate the leading factors behind this pattern. For WS, integration grows and segregation drops as the network dimensionality grows. The same happens for RGN as the radius r grows. Indeed, both increased dimensionality and r leads to generating networks with a higher edge density, allowing us to isolate the important role played by edge density in dictating the state of integration and segregation of spatial networks. For topological (i.e. not weighted) networks the Global Communication Efficiency, used to estimate integration, grows as the edge density grows. This is indeed what we observe in Additional file 1, Fig. 1 while a less tight correlation can be observed for segregation in Additional file 1, Fig. 2.

Figure 3
figure 3

Simulating the functional organisation of synthetic urban models. Top Left: Small-world networks according to the Watts-Strogatz model (see Methods) with different rewiring probabilities (encoded by size) and dimensions (from 1D to 3D, encoded by color). Top Right: Random Geometric Networks (see Methods) with different characteristic spatial scale (encoded by color) and different rewiring probabilities (encoded by size). Clusters here fall above what observed for WS model. Bottom: The functional organisation of real cities, observed thorough the lens of the topological networks derived from the Foursquare flows (see Methods), follow the same trend as in the that of WS networks. In all panels, the dashed line represents the linear regression relating integration and segregation for the WS model, whereas the solid line is \(y=1-x\) and it is shown as a reference

However, the values observed in Fig. 2(b) deviate sensibly by those describing the networks we generated in Fig. 3. This because the urban functional networks are defined as weighted networks, while our null models do not describe weights. Indeed, if we reduce the urban functional networks to a purely topological undirected network, we see in Fig. 3 (right) that the numerical values of topological urban functional networks correspond to those described by WS model (dashed line).

To isolate the driving factors determining a city integration and segregation we have to expand from the ideal world of synthetic models and find instead guidance from the methods commonly adopted to investigate the physics of cities. Many properties of cities are known to be power law functions of population size [59]. Here, we are not in the position of deriving with precision the population in the area defined by the Foursquare data, and we use instead as measure of the city size the square root of the area covered (\(L= \sqrt{A}\)) which is also a proxy for the average length of a trip in a city [63]. We therefore plot in Fig. 4(a), (b), (c) the values of Functional Segregation and Structural and Functional integration against L (see Additional file 1, Fig. 3 to see how other network indicators scale). In our case, the sizes of the cities considered are not diverse enough for initiating a meaningful discussion based on the value of the exponents observed (that are reported in panels (a) and (b) only to support future studies on the matter). We focus indeed on the fact that a power law scaling is able to explain most of the variance observed for Functional Segregation (\(R^{2}=0.67\)) and Structural Integration (\(R^{2}=0.71\)) but totally fails at predicting the values of Functional Integration (\(R^{2}=0.05\)). In other words, size matters. In particular it matters for functional segregation, also linked to the total flow circulating over the network (Additional file 1, Fig. 2(c)): in fact, as observed in [73], it can be expected to grow proportionally with population. However, there is something more that is strongly influencing functional integration and makes it deviate from the structural integration (as seen in Fig. 2(d))). This extra factor is determined by how flows are distributed in the network. To show this, in Fig. 4(d) we compute how much the weighted functional networks deviate from the values estimated from the structural network as \((GCE_{funct}-GCE_{struc})/GCE_{struct}\), and plot it against the flow hierarchy estimated for the same city from another dataset (numerical values computed and obtained from [74]). A low flow hierarchy indicates that larger fraction of movements are expected to be between strong mobility hubs and less active areas. This means that, in general, excess of integration is expected when marginal areas are more strongly connected. This appears similar to what observed in hierarchical modular brain networks, which are locally segregated, but global neuronal operation integrate segregated functions [75].

Figure 4
figure 4

Understanding Functional Segregation and Integration. While the functional segregation (a) and structural integration (b) show a clear dependency over city size, functional integration (c) is not simply determined by how big is a city. In (d), we plot the deviation between functional and structural integration, computed as \((GCE_{funct}-GCE_{struc})/GCE_{struct}\) vs the values of flow hierarchy for the same cities computed in [74] from another dataset

Lastly, using the RGN model we also measured the importance of the spatial extension of the network. Fixing the radius below which nodes are connected, we find (see Additional file 1, Fig. 4) that the largest the area (\(A= L^{2}\)) covered by a square RGN the more the network is segregated and the less it is, at the same time, integrated. Indeed, here again integration and segregation seem to be very strongly correlated and increasing the radius have a similar effect as reducing the spatial extension.

2.5 Cities within a city

Having understood the behaviours of integration and segregation of cities at an aggregated level, is worth checking if this pattern is an intrinsic feature of urban systems or if it is proper of some specific activity layers. Indeed, the metadata of the venues include a category field which describes the type of venue in great detail (e.g.: Knitting Stores, Mini Golf Courses, Rock Clubs, …). We defined a set of macro-categories we used to aggregate categories in limited number of layers (see Methods and Fig. 1 middle). Statistical information about the number of nodes and links in the different layers are provided in Additional file 1, Table I.

In Fig. 5(a) we can visually inspect some examples of activity-aware layers. Remarkably, for all the cities considered in this study, the intra-layer connectivity characterizing the transport layer provides a natural link between our functional analysis and the underlying structure of the city. In the data, however, it can be clearly seen in cities where public transport is well developed and largely used, such as Tokyo or Seoul, way more than cities where private transportation is dominant, such as Los Angeles and Istanbul.

Figure 5
figure 5

Disentangling functional flows. (a) We illustrate the strikingly distinct views on the functional organisation of a city extracted by isolating intra- or inter-layer flows. These maps outline the different “cities within the city” which we isolate by decoupling the urban flows into activity-aware multilayer networks. (b) We define the multilayer networks of human flows for each city (encoded by color) by stratifying flows according to different macro-categories used in this work (see Methods). Each point corresponds to integration and segregation measured after removing a specific layer of activities. The letter ‘T’ marks values associated to the removal of the transport layer, which strongly influence the urban functional connectivity (see Fig. 6). (c) Average functional integration for different activity categories. We observe a relationship between the average distance covered D in movement inside one layer and the value of integration (see Additional file 1, Fig. 7 for segregation). The regression is done excluding the outlier the unclassified venues “unknown” which removal appears not to influence a city’s functional integration

By disentangling the mobility flows into a multilayer network structure (see Methods and Fig. 1 right), we are able to quantify the differences in the functional organisation of human flows between different types of activities or different month (see Additional file 1, Fig. 5) enabling the identification of different “cities within the city” which indeed shows clear dissimilarities in terms of both functional integration and segregation.

To this aim, we perform targeted attacks on each layer of the corresponding multilayer network and measure the response of the systems in terms of changes in segregation and integration. In Fig. 5(b) we observe how removing those flows coming from a specific activity type significantly changes urban functional segregation and integration. This is especially true if the activity is Transport, whose removal yields the rightmost outliers in the figure. An even stronger variation is observed in the integration and segregation restricted to movements between similar layers (see Additional file 1, Fig. 6).

To better understand these differences, in Fig. 5(c) we link the average values of integration measured for flows between the same categories across all cities with the corresponding weighted average of geographical distances between nodes. We observe a bulk of correlated points and two outliers: one the natural long-range linking layer of transportation, the other the locations not associated to a macro category and left as “unknown” (see Methods). Excluding “unknown” that does not seem to influence integration at all, we observe a clear effect: removing the transport layer strongly disrupts integration, while removing short range layers actually improves it. In Additional file 1, Fig. 7 we could conversely see how, again with the notable exception of the removal of the Transport layer, the segregation of cities remains relatively unchanged after single layer removal. The results of this analysis points out that is possible to close restaurants, leisure and commercial activities while keeping a city functional and, possibly, even more integrated. This perspective provides new insight on the effects of restriction policies adopted during emergencies by quantifying a hidden, systemic, social costs and benefits associated to the closure of different kind of activities in time of a pandemic emergency.

It is natural observing how the transport layer represents the backbone of a city organisation, but for some cities this effect is stronger than in others. To understand these differences, in Fig. 6 we explore with more depth the difference in segregation and integration consequent to the removal of the transport layer. The effect is clear for the change in segregation (panels (a) and (c)): the increase in segregation. consequent to layer removal is proportional to how much flow pass though that layer. Things are, again, more complicated when we observe integration: for some cities, the integration drops of \(\approx 50\%\) without the transport layer, while for others (notably Singapore, Jakarta and Istanbul) integration is unchanged, or even slightly increased, by the layer removal (panel (b)). These three cities have also the transport layer characterised by the longest average link distance (panel (d)), and while for the other seven cities one might have dared to see a trend, similar to that of Fig. 5(c), linking higher drop in integration to longer connections, the presence of these three outliers suggests, another time, that microscopic details in the distribution of flows of a functional network can play a major role in determining its robustness and more general its organisation.

Figure 6
figure 6

Illustrating the role of transport in building integration and reducing segregation. As observed in Fig. 5(b), the removal of the transport layer modifies significantly a city segregation and integration. (a) Segregation always increases after removing the transport layer. (b) Integration drops after removing the transport layer for some cities (that may reach values as smaller as the half of the initial value) but remains similar or even raises for other. (c) The raise in segregation grows linearly with the fraction of total flow represented by in the transport layer. (d) The relative change in integration \((GCE_{removed} - GCE_{full})/ GCE_{full}\) is not simply linked to the length of the connections cut: while for seven cities it seems to follow a trend similar to that pointed out in Fig. 5(c), for three cities where the average connection length of the transport layer is very large strongly deviate from this trend

3 Discussion

Understanding how cities process information, here encoded by human flows, is of paramount importance for designing more efficient and smart urban systems and communities. By characterising at the same the structural and the functional organisation of 10 large urban systems in terms of well defined and normalised measures of network integration and segregation, we have shown how network-based analysis can support, and further expand, ongoing discussions about and novel understanding provided by the ICT-data driven quantitative urbanism [38].

From a modelling perspective, going beyond the antonymic dichotomy between integration and segregation by studying the Segregation/Integration diagrams allowed us to expand our understanding of the interplay between the urban structure, social relationships and human behaviour. This can be exemplified by three clear results. First, the identification of the dominant factor dominating this negative correlation (the edge density, which is in turn a function of a city size) and forcing the deviations from it (the hierarchical structure of flows). Second, the correspondence of the empirical results with those of Small World networks shows that for modelling urban system one has necessarily to go beyond “first neighbour” transmission as long range interactions are extremely relevant to reproduce the many salient features measured from empirical data. Third, we were able to rightfully isolate, using this approach, the essential role played by the transportation layer that is pivotal for both integration (thanks to its long distance connectivity) and segregation (thanks to its large flows).

Under this lens, many features of complex megacities can be therefore understood from simple mechanisms related to geometric constraints and city’s characteristic size, with larger cities tending to be more segregated and less integrated. More in details, for growing cities, it is expected a transition from a monocentric to a polycentric organisation, characterised by a sub-linear growth of the number of hotspots with population [63]. Similarly, for both urban structural and functional networks, we provide evidence that large polycentric cities, which are characterised by a larger number of hotspots (although being the growth sub-linear they have a smaller fraction of hotspots as shown in Additional file 1, Fig. 3(d)), appear to be more segregated and less integrated than smaller, and monocentric, cities. We have highlighted, however, that a city can be much more integrated than what expected by its size if it display a low flow-hierarchy [74] and thus has more direct connections between central and marginal areas. However, the interplay between heterogeneities in the distribution of flows, spatial constraints, and the layered structure of flows, might be responsible for the emergence of peculiar integrated/segregated structures that might be reflected in the functional organisation of the city. Future research in this direction, including a wider spectrum of urban and non urban systems, is required to gain more insights on this matter.

Finally, from a more methodological point of view, our analysis highlights the importance of data sources for the analysis of the interplay between the city and its main users, i.e., the citizens. Thanks to the unique dataset of anonymised movements provided by Foursquare and the easy access to street data [68], we have been able to gain novel insights on urban and human behaviour in terms of interaction between structure and functional organisation of the system. The availability of activity-aware information, in particular, allowed the analysis of attacks targeted towards specific types of activities which unraveled the fundamental importance of transport as integrator an urban system. This result is specially relevant for policy and decision-making in time of crisis, provide new quantitative tools that allow one to identify a limited set of activities (commercial, restaurants, leisure) which can be prioritised or temporary limited to achieve a desired amount of human flows integrated across the city.

4 Methods

4.1 Limitations of this study

Our study is based on a large collection of user-generated access data to public venues. As all sources of automatically collected social data, it is affected by a series of biases that might influence our observations [76].

  • Representativeness. The Foursquare user-base does not cover, naturally, the totality of a city population. Some public figures are available online [77], from which we can both get indirect estimates that about 13% [78] of adult social media users in the USA used Foursquare in 2018. Since the United States about 79% of adults used social media in 2019 [79], that would make our samples for Chicago, Los Angeles and New York City covering \(\approx 10\%\) of the total adult population. Naturally not all users use it regularly (see Inhomogeneity of users’ behaviour), and also the representativeness will surely vary from country to country. To estimate how representativeness may translate to other cities, we can use as a proxy the check-ins per capita in the cities (see Table 1), which is more or less homogeneous, ranging between 0.8–0.9 for Asian cities to the higher values of American cities (2.5 in Los Angeles and 3.7 in Chicago). Using these proportions we can estimate that the total user base can be of the order of 2% in Asian cities.

  • Demographic bias. The Foursquare user-base is mostly cantered around the age 18–34 and the male population is almost the double of females. The foursquare penetration is also greater penetration among users with higher income [77].

  • Inhomogeneity of users’ behaviour. Of course, not all users are active daily on Foursquare. An empirical analysis [80] describing a dataset of Foursquare check-ins collected in 2010 over 4 months via Twitter, with no spatial boundaries set, provides hints for a dishomogeneous, but somehow limited, number of checkins per users.

  • Subsampling and missing stops. As shown again in [80], the distribution of inter-time between check-ins is long tailed. This can strongly bias the observed displacements [81]. Flows in this analysis will often not correspond to real movement but they have to be taken for what they are: subsequent checkins. For this reason, we opted to avoid focusing on the temporal disaggregation of flows that Foursquare provided on base of the hour-of-the-day and month of the arrival check-in. We decouple the functional use of a city in different months of the year of the network only to test what happen by sub-sampling the flow network.

  • Inhomogeneity of venues. Venues are not homogeneously distributed across the city, with a larger densities in the city centres. Moreover, venues display a great inhomogeneity in the number of check-ins they capture (see Additional file 1, Fig. 8).

  • Definition of city It is known that many urban measures may strongly depend on how the city itself is defined [82]. In the dataset provided, cities administrative areas were already selected (with the exception of Paris where it has been selected the “Grand Paris” area). In Additional file 1, Fig. 9, we test robustness of our metrics to the boundary definition by radially reducing the city area.

4.2 Geographic coarse-graining

We reconstruct the flows network by aggregating data over areal units of 500 m × 500 m, in all 10 cities considered. Flows are reconstructed from subsequent anonymised check-ins into Foursquare venues, ignoring the order (undirected network). Flows inside the same area have been integrated into a self-loop link only if the check-ins were between two different locations. Subsequent check-ins in the same location have been excluded from the analysis. We reconstruct the structural networks using OSMnx [68], a python library which provides a network object where nodes are the street intersection and links are defined as the stretch of road between two subsequent intersections. We coarse grained these street network to match the granularity imposed to the flow network. The short-range nature of the street network provided by OSMnx makes that these coarse grained structural maps are mostly lattice-like.

4.3 Activity stratification

We use Foursquare’s rich system of categories and manually associate them to a reduced number of macro-categories (food, lodging, tourism, work, religion, services, education, health, sport, transport, entertainment, leisure, public, housing and commercial). We do not use Foursquare Venue Category Hierarchy [83], except for venue icons in Fig. 1. The few categories that did not fit any macro-category have been labelled as ‘unknown’. These categories allow us to build “activity-aware multilayer networks”, where activities of different types are associated to different layers of our model. Flows between activities of the same macro-category are encoded by intra-layer links, while flows between different categories are encoded by inter-layer links.

4.4 Measuring functional integration

We measure to which extent a network is integrated in terms of communication, i.e., how efficient nodes are, on average, in exchanging information, using an indicator based on the concept of shortest path. Given two areal units i and j we can reasonably assume that the efficiency \(\epsilon _{ij}\) in their communication is inversely proportional to their distance \(d_{ij}\). If \(d_{ij}\) is a topological distance, counting the number of links in a shortest-path from i to j, our assumption means that the longer the path a piece of information has to travel, the more inefficient will be the communication, since the probability that the message is corrupted along the way increases. A global descriptor of the topological communication efficiency [18] of a city is then the average pairwise efficiency of its nodes is the average shortest path length in the network

$$ E = \frac{1}{N(N-1)} \sum_{i\neq j} \frac{1}{d_{ij}}. $$
(1)

4.5 Normalising functional integration of flow networks

For flow networks, like those analysed in this paper, given the additional information on the strength of connections distances can be very different. If the flow between two nodes is large, their distance should be, intuitively, small. For this reason, the distance averaged has to be that of weighted shortest-paths, minimising the sum of costs along all paths between pairs of nodes. In a flow network with edge weights representing the intensity of the connections, the costs of edges are the inverse of weights.

Unfortunately, (1) cannot be effortlessly generalised to weighted networks, since it depends on the scale of weights. Latora and Marchiori proposed a weighted efficiency descriptor in [84], rescaling the value of efficiency in \([0, 1]\) considering an idealised proxy considering an idealised proxy of G, \(G_{\text{ideal}}\), having maximum efficiency. However, that finding the ideal proxy \(G_{\text{ideal}}\) of a network G for the normalisation of the weighted \(E(G)\) is often ambiguous.

An universally valid solution for the normalisation of the global efficiency, capturing at the same time information of link existence and link weights has been proposed in [71], enabling the comparison of communication efficiency of disparate systems. The idea is that each (weighted) shortest-path in the network has a length, which is the sum of links costs along the path, and a total flow, which is the sum of the links weights. These path flows \(\phi _{ij}\) are strictly positive for each pair of nodes \((i, j)\) in a connected network and can be added to the original network as an artificial direct flow between i and j. In other words, to the network G are added artificial links representing all missing shortcuts between pair of nodes, which allow to deliver the total flux through a shortest-path from origin to destination in one topological-step.

A correct normalisation of E is then possible using this network \(G_{\text{ideal}}\) resulting from a physically-grounded enrichment procedure independent from the scale of flows and from any metadata or the lack thereof. The normalised Global Communication Efficiency can be then computed as:

$$ GCE = E(G)/E(G_{\text{ideal}}). $$
(2)

4.6 Measuring functional segregation

A usual measure of network segregation, quantifying how strongly the units are organised in into M non-overlapping blocks, is the modularity [72]

$$ Q = \sum_{u \in M} \biggl[e_{uu} - \biggl( \sum_{v \in M} e_{uv} \biggr)^{2} \biggr], $$
(3)

where \(e_{uu}\) is the proportion of links inside module u, while \(e_{uv}\) accounts for the connectivity between two distinct modules u and v. More specifically, our measure of segregation is the maximum value \(Q^{\ast }\) of the modularity that we find using the Louvain algorithm [85]. We also verify that the observed modularity is significant, by comparison with the values of \(Q^{\ast }\) computed over an ensemble of configuration models obtained reshuffling the network (see Additional file 1, Tables II and III). Finally, note that here, instead, we used the weights defined by flows. Values of \(Q^{\ast }\) for weighted and unweighted networks are indeed comparable, as opposite to what discussed above for E, and using weights here allowed us to better discern the characteristics of different layers.

4.7 Synthetic network models

We use two standard spatial network models for our analysis.

We first consider a class of networks characterised by small average geodesic distance: the Watts-Strogatz (WS) model. Starting from a regular graph, e.g., a two-dimensional lattice, each link has a probability p of being rewired, that is removed and re-placed randomly in the network. If p is large the resulting WS network will look more like an ER random graph than the original lattice. WS networks are also highly clustered, where nodes tend to form closed triangles. WS model are usually referred to as small-world networks.

Alternatively to WS, we study also the simplest network model actively involving the spatial dimension model is the random geometric network (RGN), where nodes randomly distributed in space are connected if they are closer than a fixed threshold distance. The RGNs share many important properties with regular lattices, in particular they are not “small world”. For this reason, similarly to the WS case, here also for the RGN we perform a rewiring with probability α.