Introduction

In areas where the presence of corruption among public figures is high, it may be the case that new entrants into the political discussion network are willing to accept a certain degree of financial compensation in exchange for the control input required to move them around the opinion space. If such nodes in the network have the option to choose their connections, they are incentivized by the perception of corruption to choose their position in the network carefully; so that they are maximizing their probability of being selected to drive the network to a new state by corrupting parties and receive payment in exchange for their efforts. This type of “quid-pro-quo” controllability of the political network is considered undesirable by most citizens as a form of corruption, since it allows the financial concerns of the representatives to take precedent over the needs of constituents.

Political discussion, like most forms of discourse, takes place in a social network (Huckfeldt and Sprague 1987; Huckfeldt et al. 2014). When one node influences another, this influence may be manifested by actions such as bill cosponsorships (Briatte 2016; Fowler 2006). Only very recently have researchers begun to examine the potential benefits of modelling the formation of these legislative networks as endogenous processes rather than as exogenously determined structures (Battaglini et al. 2019).

Our model argues that outside parties wishing to control the states of individuals in the network can request a node to shift the state of their opinion or platform on some topic in exchange for monetary payment, and that this opinion shift has spillovers into the opinions of other nodes through a social learning process. Further, we hypothesize that anticipation of this payment leads to rent seeking by the manipulation of individuals’ position in the network to make them more important to control. If this is the case, the controllability of these networks could then, under certain conditionsFootnote 1, be used as an instrumental proxy for corruption. This is important since the effects of corruption can be notoriously difficult to measure (Olken and Pande 2012). This feature, of course, means that “corruption” must be carefully defined in order to prevent its confusion with the term “controllability,” to which we argue it is closely related. For the purposes of this analysis, corruption will be defined as the acceptance of control input signals in exchange for monetary payment. When a request is sent to a node to change their position on a certain topic, it is coupled with some financial reward which is increasing in the distance between their current state and the desired state.

Key terms

Complex networks are ubiquitous and crucial to the description of human dynamics and complex adaptive systems. As such, many tools have been proposed to analyze their structure. In this case, the network of interest is a social network in which nodes represent human beings and links between them describe the direction and flow of opinions and influence. In this way, the social network describes a linear dynamical system in which the state (opinion) of each node (person) evolves with the other individuals in the network to which they are linked. A key feature of linear dynamical systems, including networks, is that they can be controlled to an arbitrary state from a limited number of nodes (Liu et al. 2011; Jia and Barabási 2013).

What this means is that if an interested party were able to “shift” the opinions of enough individuals, they could control or drive the consensus opinion to any possible state that they were to desire. When a group of individuals collectively has enough influence to drive the network to any arbitrary state, they are known as driver nodes and the group is known as a driver set. If it is possible to control the states of all nodes in some driver set, then the network system is said to be controllable. If a driver set is of the minimum possible size, so that there does not exist a smaller set of nodes from which the network is controllable, then it is known as a minimum driver set. The size of the minimum driver set, denoted ND, is easy to compute using the eigenvalues of the adjacency matrix, an n×n real-valued matrix which describes connections between nodes. There may be many driver sets, however, which contain this number of nodes. Control capacity is a concept used to describe how critical an individual node is to any driver set. That is, if minimum driver sets are chosen uniformly at random, a node’s control capacity gives the probability that the randomly selected driver set contains this node (Jia and Barabási 2013).

Relevant literature

There has been some prior work which has examined the network structure of corruption (Cartier-Bresson 1997; Warburton 2013; Ribeiro et al. 2018), and the controllability of these networks as it relates to changing attitudes toward illegal activity (da Cunha and Gonçalves 2018). These papers have primarily focused on the network between corrupt actors, and have been remarkably successful in the prediction of future scandals (Ribeiro et al. 2018). A comprehensive overview of other network and complexity science approaches used for social good and the betterment of society – through means including the reduction of crime and slowing the spread of disease – can be found in Helbing et al. (2015).

In related work by Podobnik et al. (2015), an environment is studied in which two networks with different goals compete to control each other. They find that such control is not possible without a reduction in the resilience of the attacking network. This finding could help to motivate our later assumption that the corrupting party will choose to control a minimum driver set in the legislative network, since controlling excess nodes could come at a steep cost in terms of the resilience of their own social structure which could easily exceed the benefits gained through reduced energy costsFootnote 2. The tradeoff between control energy cost and the number of controlled nodes is characterized by Li et al. (2015).

Bac (1996) presents a game theoretic model of corruption within an organizational monitoring hierarchy, and finds the surplus from corruption to the corrupting agent to be higher when implemented at the top of the hierarchy. These results were generalized to policy recommendations by Rosenblatt (2012), who attribute organizational corruption to social dominance hierarchies between groups. Notably, these efforts are focused on organizational hierarchies and the effects of exogenously imposed managerial structures, rather than on social networks and endogenous connections between individuals of similar social standing. A recent paper by Luna-Pla and Nicolás-Carlock (2020) uses a network of shell corporations and their personnel to examine empirical evidence from a corruption scandal in Mexico. They use this data to find structural indicators and characteristics of management and ownership structure which might signal corruption within companies, and conclude that the layer of these networks which provides the most information about underlying corruption is at the shareholder level.

In a departure from existing work on the network science of corruption, the analysis of this paper is focused instead on the effect of corruption as exerting influence within the entire social network between politicians, and thus should provide new insights toward both the study of political corruption and that of social network formation in general. This type of analysis has been loosely proposed by Granovetter (2007), which calls for further theoretical analysis of the phenomenon of corruption and its effects within a larger social structure. Indeed, (Granovetter 2007) emphasizes the idea that “the ability to effectively corrupt the administration of some substantial activity requires corruption entrepreneurs to be masters of social network manipulation” and discusses the “reluctance of lower-level employees [in the bureaucratic hierarchy] to follow clearly illegal orders” from their superiors in this hierarchical network.

This reluctance is modeled as the removal of incoming influence, which generates our primary theoretical and empirical result. Most notably, the hierarchical congestion result of this paper should be robust enough to be applied to all kinds of social networks in which nodes determine incoming links and there are potentially time-varying monetary incentives to accept control input.

The probability that a node is selected as part of a randomly selected minimum driver set, the smallest set of individuals through which the network is controllable, is known as its control capacity (Jia and Barabási 2013). It is shown by Jia and Barabási (2013) that the control capacity of a node in a directed network is closely related to the structural in-degree of that node. In political networks discussion takes place largely in open forums so nodes do not choose who to influence, rather they choose who influences them and to what degree. Essentially, we state that for nodes who expect to receive monetary value from control input, the opportunity cost of forming additional incoming information flows is inflated by a loss of expected revenue from control input. This is consistent with the observation that driver nodes tend not to be high-degree hubs, which is documented by Liu et al. (2011).

In addition to the decrease in tendency for information-providing incoming links, preference for control input also affects who a node will wish to connect with. If a node’s neighbor is a hub, for example, with many outgoing connections, then there is a high likelihood that there will be a structurally controllable “matching path” (Liu et al. 2011) which runs through the hub, but a low probability that this path includes any of the individual neighbors of the hub, since only one of the hub’s outgoing links may be included in a maximum bipartite matching. Likewise, the corrupt node has incentive to avoid becoming part of a cycle, since cycles in a graph can be easily controlled through other nodes as part of a “cactus” style matching path (Liu et al. 2011).

Perturbations in linking and formation can have large scale effects on the controllability of the network (Wang et al. 2012). Specifically, if nodes are strategically reducing their in-degree, preferentially attaching to hubs, and avoiding cycles and triadic closure, these concerns can cause the network to be more difficult to control as measured by the fraction of nodes required to control the entire system. This number is denoted nD and is also defined as the relative size of the minimum driver set or the fraction of unmatched nodes in a maximum bipartite matching (Liu et al. 2011). This measure of the “pinning controllability” of a network was analyzed by Liu et al. (2012) and has far-reaching applications which range from the analysis of systemic risk in economic trade and lending networks (Delpini et al. 2013) to the identification of biomarkers for the diagnosis of neurodegenerative diseases such as Parkinson’s and Alzheimer’s (Tahmassebi et al. 2019).

Data and methods

We will use network data from twenty European parliaments, collected by Briatte (2016), along with the World Bank’s World Governance Indicators of corruption and governance effectiveness (Kaufmann et al. 2011), to construct a panel which we will analyze using fixed effects regression to provide evidence of this theory while controlling for any country-level variables, such as legislative frequency and societal or ethnic factors, which may affect the baseline levels of controllability in these networks. Figure 1 highlights fixed differences, such as network size, which differentiate between the legislatures of different networks but can be controlled for using fixed-effects regression techniques (Woolridge 2002).

Fig. 1
figure 1

Differences in Legislative Structure Between Countries. A side-by-side comparison of the legislative networks in (a) Austria and (b) Sweden, both from the same year (2002). This figure highlights the baseline structural differences between the networks which necessitate controlling for fixed effects at the country level

The network data provided by Briatte (2016) consist of temporal observations of co-sponsorship networks in twenty European parliaments. These networks are broken up into multi-year time periods based on when there are structural changes in the network, such as elections. First we will observe the effect that temporal changes in the perception of corruption have on the controllability of the legislative networks. Then, we will explore the use of controllability as an instrument for corruption in regressions on governance effectiveness. Additionally, we will use simulations to show the mechanisms by which nodes in a simple structural model of network formation can use simple heuristics to alter their control capacity, and thus the controllability of the entire network system.

Theoretical model

A common model used to describe social learning and influence in networks is called the DeGroot model (Golub and Jackson 2010; Huckfeldt et al. 2014; DeGroot 1974). This model simply states that a node’s opinions over a certain topic will evolve as a weighted average of their neighbors’ opinions, so that the dynamics of the opinion vector x take the form:

$$ \dot{x}=Ax $$
(1)

Where \(\dot {x}\) refers to the time derivative of the opinion vector x, and A is an n×n weighted adjacency matrix with Aij>0 if node j influences node i, and Aij=0 otherwise. A key feature of this type of system is that it is controllable. When under control of an outside party, the dynamics of the system become:

$$ \dot{x}=Ax + Bu $$
(2)

Here, the n×m matrix B represents a control schematic with m external controllers, so that Bij=1 if node i receives control input from controller j, and Bij=0 otherwise. Once the schematic is known, a vector u of control inputs is sent to the nodes. Now the key distinction between controllability and corruption is made. In a corrupt network, nodes receive some monetary transfer in exchange for altering their stated position or opinion on some certain topic and influencing their neighbors in the network. Their value of control input is given by a function Ji of the expected control input received by node i and the “corruptibility” parameter ηi. Alternatively, this can be seen as the price that this node will charge in order to allow for control input signals to alter their state. Thus, to a node in the network, corruptibility ηi and control capacity \(B_{i}=\sum _{k=1}^{m} \mathbb {E}[B_{ik}]\) are complements. Let Ji represent the value of control input to node i. Then Ji is given by:

$$ J_{i} = \eta_{i} \sum_{k=1}^{m} B_{ik} f(u_{k}) $$
(3)

Where \(f(u_{k}):\mathbb {R}\longrightarrow \mathbb {R}^{+}\) is some increasing function. Nodes do not know, or have uniform priors over the value of uk, since they are not aware, ahead of time, to which state the outside party will desire to control them. In other words, to maximize their incoming utility from corruption, nodes can only alter their position in the network to change the probability Bi that they are selected as a driver node; their incentive to do so varies with their corruptibility ηi.

Finally, it is assumed that corrupting parties who wish to control the network will select a minimum driver set so that m=ND, and that the prior expectations of the agents over which minimum driver set will be selected by an external corrupting party are uniform. This means that the set of corrupt actors which are chosen to control the network will be of the minimum size possible. It also guarantees that the probability \(B_{i} = \sum _{k=1}^{N_{D}} \mathbb {E}[B_{ik}]\) that a node is selected as a driver node is equivalent to the control capacity (Jia and Barabási 2013) of that node.

Since exactly one of the Bik will take the value 1 if the node is selected as a driver (with probability Bi) and they will all take the value 0 otherwise, it follows that the sum of nodes’ control capacities is always equal to ND, the size of the minimum driver set. Thus as nodes make changes in linking patterns to raise their own control capacity it will increase ND, all else constant. Therefore this assumption on preferences – specifically that nodes can only alter their utility from control input through the probability that they are selected as a driver – guarantees that a rise in the perception of corruption in a community should be coupled with a decline in controllability of the social networks that are formed within it. This is due to the phenomenon shown by Jia and Barabási (2013) that control capacity depends only on a nodes in-degree and is independent of its out-degree, and the subsequent reasoning that by removal of incoming influence, an individual can increase their control capacity without affecting their neighbors, effectively raising the size of the minimum driver set.

A sufficient, but not necessary, condition which leads to this to this result is for the control input values to be drawn from a random distribution, such as uniform. The only truly necessary condition, however, is that the magnitudes of realized control input values are conditionally independent of a nodes’ incoming linkage patterns, so that individuals can only change their expected utility from incoming input by changing their control capacity.

To summarize then, control capacity helps to uncover the hierarchy which is buried in the complexity of all network topologies. It states that, if you wish to control a network, these are the nodes which you must control. Naturally, if there is an increase in one node’s control capacity, ceteris parabis, then it must be the case the cardinality of the minimum node set required to control the network has increased, thus making the network system as a whole more difficult to control. It follows that when multiple nodes compete to increase their control capacity, controllability of the network will become more difficult. This novel effect is what this paper terms “hierarchical congestion”.

The assumption that corrupting parties choose from a uniform distribution over minimum driver sets (or, equivalently, that nodes have uniform prior expectations over the set of minimum driver sets as to which will be chosen) means that for each of the m controllers indexed by k, the probability of being selected for a controller is equal to the control capacity of the node, or, mathematically, that \(\forall k,l \in [1,m]\subset \mathbb {Z},\; \mathbb {E}[B_{ik}] = \mathbb {E}[B_{il}] = \frac {1}{m} B_{i}\) where Bi represents the control capacity of node i.

Thus the above assumptions are sufficient to guarantee that increases in the frequency of corruption should be coupled with increases in the relative size of ND, making controllability of the network more difficult. This means that as the network becomes more corrupt (nodes expect a higher demand for control of the network and thus expect a larger payment in exchange for control), the network actually becomes harder to control (as measured by the size of the minimum driver set) due to rent-seeking at the individual level. This is the key result which will be tested in this paper.

The size of the minimum driver set required for exact controllability of the network can be found using the following method derived in Yuan et al. (2013). First, in order for the network to be fully controllable under the control scheme B, it must be the case that

$$ \text{rank}(cI_{n}-A,B)=n $$
(4)

For any constant c. This condition, called the Popov–Belevitch–Hautus (PBH) rank condition, is equivalent to the classic Kalman condition which states that the controllability matrix \(C=[B, AB, A^{2} B,\dots, A^{n-1}B]\) must be of full rank. Intuitively, these conditions are true when it is possible for a signal, input to the nodes in B, to reach every node in the network. Using the PBH formulation (4), the minimum number of nodes which must be controlled in order to fully control the network can be found by looking at the maximum geometric multiplicity μ(λi) of the eigenvalue λi of the adjacency matrix A.

$$ N_{D} = \max_{i} {\mu (\lambda_{i})} $$
(5)

This is proven by Yuan et al. (2013), and is evident by the observation that the first term in (4), cInA, is the matrix whose determinant forms the characteristic polynomial of the adjacency matrix A. Thus its minimum rank will occur at the point where c= arg maxiμ(λi).

We normalize the measure ND by dividing it by the size of the network, in order to achieve a regressor which is independent of the size of the network. The result is the normalized controllability indicator \(n_{D}=\frac {N_{D}}{n}\). Another issue arises with using this metric to compare political networks, and this is the existence of multicameral legislatures in certain countries. Since political networks within a country should be expected to have similarly controllable features, the adjusted controllability measure is proposed as:

$$ \bar{n}_{D} = \frac{\sum_{c} N_{D}^{c}}{\sum_{c} n_{c}} $$
(6)

This is simply the weighted average of controllability across different chambers c of the legislature, each of size nc. In other words, this adjusted controllability indicator shows what proportion of individuals in these networks must be controlled in order to have full control over the entire legislature.

Formation

The goal of this paper is to show that concerns of individual nodes for their control capacity can have an emergent effect on the structure and controllability of the entire network. In this section, we examine the nodes’ ability to alter controllability using the simple heuristics of lowering their in-degree and avoiding triadic closure. The benefit of the analytical strategy proposed in this paper is that it allows for nodes to be decoupled from their individual linking and formation strategies and focuses instead on emergent indicators. It is important, however, to show that it is possible, at some level, for the individual preferences of nodes to have aggregate effects on the controllability of a network.

In order to accomplish this, we will introduce a simple model of social network formation games which can be microfounded and tied to node-level preferences using the method of Mele (2017), and see how it allows for nodes to heuristically manipulate their connectivity in order to increase their control capacity. In this model, nodes have utility:

$$\begin{array}{*{20}l} U_{i}(A;\theta) &= \sum_{j=1}^{n} A_{ij}l(\theta_{l}) + \sum_{i=1}^{n} A_{ij}A_{ji}r(\theta_{r}) \\&+ \sum_{j=1}^{n} A_{ij}\sum_{k=1\atop k\neq i,j}^{n} A_{jk}v(\theta_{v}) + \sum_{j=1}^{n} A_{ij}\sum_{k=1 \atop k\neq i,j}^{n} A_{ki}w(\theta_{w}) \end{array} $$
(7)

Where l, r, v, and w are bounded and real valued functions of their parameters. These functions represent direct linking benefits, reciprocity, indirect link benefits and popularity effects, respectively. Note that we have simplified the original model by assuming that nodes are homogenous in attributes, a decision which is motivated by both a desire for simplicity in the simulated environment as well as by the observation of Pósfai et al. (2013) that community structure has little to no effect on the controllability of the network. Under the assumptions laid out in Mele (2017)Footnote 3 the game is a potential game (Monderer and Shapley 1996) and there exists a potential function:

$$ Q(A; \theta)=\sum_{i=1}^{n} \sum_{j=1}^{n} A_{i j} l_{i j}\left(\theta_{l}\right)+\sum_{i=1}^{n} \sum_{j>i}^{n} A_{i j} A_{j i} r_{i j}\left(\theta_{r} \right)+\sum_{i=1}^{n} \sum_{j=1}^{n} \sum_{k=1 \atop j \neq i}^{n} A_{i j} A_{j k} v_{i k}\left(\theta_{v}\right) $$
(8)

With the property that a unilateral deviation in strategy has the same effect on an individual node’s utility as it does on this potential function. To briefly summarize, the assumptions required for this to be true involve the symmetry of m across players and that of v with w. Most of the strength of this assumption is eliminated by our assumption of homogeneity in nodal attributes. Due to the existence of this potential function, and under the further assumptions of a random meeting process that is uncorrelated with link existence, and the bounded rationality of agents modeled as an idiosyncratic shock to preferences following a Gumbel distribution, the formation game evolves as a Markov chain and has the property that it will converge to a unique stationary distribution

$$ \pi(A ; \theta)=\frac{\exp [Q(A; \theta)]}{\sum_{\alpha \in \mathcal{A}} \exp [Q(\alpha ; \theta)]} $$
(9)

If the potential function Q is linear in parameters, (and nodes are homogenous in attributes), this distribution can be written as

$$ \pi(A; \theta)=\frac{\exp [\theta't(A))]}{\sum_{\alpha \in \mathcal{A}} \exp [\theta't(\alpha)]} $$
(10)

This distribution belongs to an exponential family of random graph distributions (ERGM) (Desmarais and Cranmer 2012) and can be estimated or simulated using the ERGM package in R (Hunter et al. 2008; R Core Team 2019). Note that under linearity in parameters, the three terms in the potential function give link frequency, number of mutual links, and triadic closure, respectively. The parameters associated with these network features in the ERGM regression are θl, θr, and θv. Issues with asymptotics of the model for large network estimation and methods for Markov-Chain Monte-Carlo simulation of the intractable normalizing constant in this model are covered in Mele (2017). For each simulated network, structural controllability is computed using the method of Liu et al. (2011). This method is used, as opposed to the exact controllability of (Yuan et al. 2013), because the networks drawn using this simple version of the ERGM are unweighted, which could make them harder to control than real-valued weighted networks. In the networks which will be analyzed in the next section, entries are real-valued, so their exact controllability should be equal to the structural controllability. Indeed, it is shown by Liu et al. (2011) that the space of networks for which structural controllability is not equal to exact controllability is of measure zero and occurs in networks which are not sufficiently asymmetric or non-normalFootnote 4. Structural controllability is calculated by computing a maximal bipartite matching of the network and counting the cardinality of the set of unmatched nodes. For the purposes of the simulation this is done using Octave (Eaton et al. 2017).

Simulations

Up to this point, we have considered individuals manipulating their incoming linking pattern to become more crucial to control as somewhat of a “black box”. That is, we have simply assumed that individuals can somehow take action to raise their control capacity, and that this subsequently raises the size of the minimum driver set, without providing a specific method by which this is accomplished. In this section, we will use simulations to show how heuristics which place differential biases on certain linking patterns could be a potential mechanism by which the actions of individuals to raise their control capacity could be taken and which would thus affect the controllability of the resulting networks.

Simulations are focused on edgewise attributes which should affect controllability of the network as a whole. Similar simulations, and a discussion of how in- and out-degree affect a node’s individual control capacity are performed in Jia and Barabási (2013), and the effects of individual properties of the network on its aggregate controllability have been simulated by Pósfai et al. (2013). We use the results of these prior simulations to inform our simulation strategy by selecting the parameters which should likely affect these network properties and thus likely affect its controllability. These simulations differ from prior ones, however, in that they tie the network properties to node-level preferences in a social network formation model, and with multiple degrees of freedom.

First, network parameters are drawn randomly from a given interval. Then, a network of size n=100 is drawn from the distribution with these parameters, and its structural controllability is computed. In order to simplify the model further for simulation purposes, the coefficient of mutual links is fixed at θr=0Footnote 5, since the coefficients of greatest interest to this paper are those on individual links and loops (as heuristically measured by transitive triadic closure).

As an initial sanity check, a sample of one thousand networks with n=50 were drawn from ERGM distributions in both “high controllability” and “low controllability” conditions. In the high controllability condition the parameters were fixed at θl=−3 and θv=−0.3, while in the low controllability condition they were lowered to θl=−6 and θv=−0.6. Controllability of each of these networks was computed, and a non-parametric two-tailed Mann-Whitney U test rejected the null hypothesis that the median controllability of the two samples of networks were identical with p-value less than 0.00001, indicating a high level of confidence that the networks in the high-controllability condition do, in fact, generate networks which are more controllable. This is not surprising, given the well-documented significance of edge frequency on controllability of a network (Pósfai et al. 2013; Jia et al. 2013; Jia and Barabási 2013). Figure 2 presents a network drawn from each of these conditions, and further highlights the importance of edge frequency.

Fig. 2
figure 2

Controllability of Random Networks. These visualizations highlight how small changes in ERGM coefficients can have large effects on the controllability of a network, driven primarily by changes in link frequency. In each network (a) and (b), a randomly selected minimum driver set is highlighted in red

These results encouraged a full simulation. For the simulation, parameters for the linking and triadic closure coefficients are sampled uniformly from the real intervals θl∈[−100,0] (consistent with a positive linking cost), and θv∈[0,100] (indicating that nodes have a positive value of indirect linking spillovers and popularity which lend themselves to triadic closure). Since the structural controllability of the network can never be lower than \(n_{D}^{s}=0\) (in the case where any individual node can control the entire network), or higher than \(n_{D}^{s}=1\) (the case where all nodes must be controlled), we fit a tobit censoring model (Amemiya 1973) to the data with a lower limit of \(n_{D}^{s}=0\) and upper limit of \(n_{D}^{s}=1\), and the following specification:

$$ n_{D}^{s} = \gamma_{1} * \theta_{l} + \gamma_{2} * \theta_{v} + \epsilon $$
(11)

Initial results appeared to show with a high degree of statistical significance that the controllability of the network is increasing in the two parameters of interest (or, alternatively, the fraction \(n^{s}_{D}\) of driver nodes required in order to structurally control the entire system is decreasing in both link frequency and triadic closure. Results of this simulation can be seen in Fig. 3.

Fig. 3
figure 3

Simulation Results. In a simple formation model, edge costs are the most important determining factor of network controllability, and triadic closure appears to matter on the margin. This figure highlights the stark impact of θl on controllability, driven by the phase transition from dense to sparse networks which occurs due to rising edge costs

On observing the graphical results, however, it becomes clear that there may be more going on than the tobit regression results appeared to indicate. That is, when the triadic closure coefficient is greater than the negative of the edge cost, the controllability of the network becomes noisy, but there does not appear to be a significant trend toward more controllable networks apart from this noise. In order to verify this, a new simulation was performed, this time with parameters θl∈[−15,0] and θv∈[0,1500]. Further, in effort to smooth the surface, the parameters were not drawn from a uniform distribution and instead were drawn from a grid over the parameter space. For each pair of parameters on the grid, five networks were drawn and their average controllability was computed. In these simulations, it became clear that the estimated effect of the triadic closure coefficient dropped off significantly as the parameter was drawn from a larger interval. This indicates that triadic closure matters on the margin, but is asymptotically insignificant in the determination of controllability of a network.

A tobit model is used, rather than a nonlinear function, so that coefficient estimates can be readily interpreted. Table 1 shows the results of the tobit regression for this second round of simulations. It is clear that the edge coefficient has a strongly significant and negative association with nD, or, equivalently, positive effects on the controllability of the network. The apparent bimodality of the indicator shows the strong tendency of these ERGM models for degeneracy – meaning that they usually will generate either completely dense or sparse networks.

Table 1 Tobit regression analysis of simulated network controllability

The purpose of these simulations was to show how members of a network formation game can use simple heuristics to alter their control capacity, and that these changes in control capacity are evidenced by an emergent effect on the controllability of the network that they form. The simulations appeared to show that this is possible even in a very simple model of network formation through a heuristic which places upward pressure on direct link costs. While there is no known closed-form solution to solve for the exact magnitude of this cost change, the benefit of the analytical approach used in this paper is that it does not require imposition of any structural utility function or specific heuristic in order to yield results, provided that nodes have a positive value of control capacity which increases with their corruptibility, and that they are able to heuristically alter this measure when required.

Simulations suggest another possible network-level covariate which could be related to corruption. That is, these simulation results, along with those of Jia and Barabási (2013), suggest that nodes could use the simple heuristic of lowering their structural in-degree to raise their control capacity and thus the size of the minimum driver set. Alternatively, this can be viewed as heuristically inflating the cost of forming incoming links. If this is the case, and in-degree is the primary instrument that was used by nodes to alter their control capacity, we may expect to see a strong correlation between the link density of the network and the corruption indices. The structural density of the network is taken as the number of nonzero edges it contains:

$$ R_{s} = \sum_{i=1}^{n} \sum_{j=1}^{n} \mathbbm{1}_{\{A_{ij} > 0\}} $$
(12)

Where \(\mathbbm {1}_{\{A_{ij} > 0\}}\) is an indicator which takes the value 1 if there is a link between j and i, and takes the value 0 otherwise. In order to normalize structural density so that it is relative to the network size n, it is divided by the total number of possible links in the network, which, for a directed network with n nodes, is given by n(n−1), so that \(\rho _{s} = \frac {R_{s}}{n(n-1)}\)

Finally, in order to adjust for bicameral legislatures, with nc nodes in each chamber c, we define the adjusted relative structural density \(\bar {\rho }_{s}\) to be the proportion of all possible links which exist across all chambers. Let \(R_{s}^{c}\) be the structural density (12) of chamber c of the legislature. Then

$$ \bar{\rho}_{s} = \frac{\sum_{c} R_{s}^{c}}{\sum_{c} n_{c}(n_{c}-1)} $$
(13)

Gives the adjusted relative structural density of the legislature. Intuitively, this is the fraction of all possible links in both networks which exist with nonzero weights.

Data analysis

Cross-country analysis

For the purposes of the analysis, the network formation and its dynamics must be considered as separate processes. Indeed, one benefit of this approach is that it does not require imposition of a strict structural form for preferences – it requires only that agents’ utility considers corruptibility ηi and control input uik as complements. While the separation of formation and dynamics is not ideal for the purposes of capturing the evolution of a network, it allows for us to capture the shocks to network membership which occur when new members (with potentially new attitudes toward corruption) enter the network, existing nodes leave, and the network is reformed under new norms. Regressions are focused on the following models which aim to uncover the temporal relationships between controllability \(\bar {n}_{D}\) and attitudes toward corruption within a society.

Network data are provided by Briatte (2016). The measure used to define a link weight is the “weighted propensity to cosponsor” measure first proposed by Fowler (2006). This measure provides a connectedness weight which is given by:

$$ w_{ij} = \sum_{l} \frac{a_{ij}^{l}}{c_{l}} $$
(14)

where wij represents the strength of the link from legislator j to legislator i, \(a_{ij}^{l}\) is a binary indicator of whether or not legislator j had a cosponsorship relation with legislator j on bill l, and cl is the total number of cosponsors on the bill.

World Bank Governance Indicators of Control of Corruption (CCE) were merged with the calculated controllability data of the networks to create a panel of data across 20 European countries (also including Israel and Iceland) within the time period from 1996 to 2015, in order to estimate the coefficients of regressions of the following form:

$$ y_{it} = \beta_{1} * Corruption_{it} + FE_{i} + \epsilon_{it} $$
(15)

Where yit denotes the outcome variable (either adjusted controllability \(\bar {n}_{D}\) or adjusted relative structural density \(\bar {\rho }_{s}\)), FEi denotes fixed effects at the country level, subscript t represents the time period, and Corruption is the negative of CCE, in order to create a variable which increases with corruption. Summary statistics for the variables of interest are given in Table 2. Results, shown in Table 3, reject the null hypothesis of no correlation between attitudes toward corruption in an area and the controllability of the political social networks which form there, and with a high degree of statistical significance.

Table 2 Summary statistics
Table 3 Estimates using the aggregate WGI control of corruption indicator

These first pass results support the hypothesis that political corruption is an important factor in the evolution of controllability in the social network between politicians. They do not, however, seem to reveal any significant correlation between changes in the structural density of the network and changes in corruption over time. This could indicate that the actual mechanism by which politicians change their linking strategies is more complex than simply by a reduction of their in-degree.

The motivation for this analysis is, primarily, to establish that there is a correlation between the incentives for control input and the controllability of social networks. A potential benefit of this conclusion is that controllability of a regional social network could be used as an instrumental variable for corruption, when determining its effect on other aspects of the local economy. This is, of course, provided that the variable of interest is uncorrelated with controllability of the network except through changing views toward quid-pro-quo corruption (Woolridge 2002). As an example of such an analysis, the parameters of the following system of regression equations were estimated using a two-stage least squares procedure:

$$\begin{array}{*{20}l} &GEE_{it} = \beta_{2} * \widehat{Corruption}_{it} + \beta_{3} * EU_{it} + FE_{i} + \epsilon_{it} \end{array} $$
(16)
$$\begin{array}{*{20}l} &\widehat{Corruption}_{it} = \beta_{1} * \bar{n}_{Dit} + FE_{i} \end{array} $$
(17)

Where \(\widehat {Corruption}\) denotes values of the negative of the CCE control of corruption indicator, estimated by the results of regression (14) using controllability of the network \(\bar {n}_{D}\) as an instrumental variable. Normally, regression (16) would be invalid because of potential endogeneity between the World Bank indicators. By using the controllability of the political social network as an instrument for regional attitudes toward corruption, however, we can theoretically overcome this endogeneity issue and estimate the true effects of other variables on the effectiveness of governance while controlling for corruption. Since the time frame of interest includes addition of several new members to the European Union, a multinational organization dedicated to, among other things, “[promoting] sustainable development based on balanced economic growth,”Footnote 6 there is a distinct possibility that EU membership changes over time could affect the effectiveness of governance in member countries. For this reason, we estimate the effects of EU membership on governance effectiveness in regression (16) while controlling for corruption through the proposed instrument. The estimated effect of EU membership on the effectiveness of governance in a country when controlling for corruption through the instrument of controllability of the social networks within legislature was found to be positive and highly significant.

Robustness

The World Bank Worldwide Governance Indicators (and aggregated survey-based indicators in general) have been heavily criticized for their potential issues with endogeneity between the indicators as well as difficulties which arise when making cross-country comparisons due to the lacking of certain data for some countries and time periods (Kaufmann et al. 2011; Knack 2007). Fortunately, the World Bank makes some data on individual survey sources publicly available. We conduct the same analysis on these lower-level data in order to establish robustness of the results to potential issues with the indicators.

Specifically, the same regressions as above were conducted on data from The Economist Intelligence Unit (EIU), which is one of the individual data sources that is harnessed to construct the World Bank Worldwide Governance Indicators (WGI) used above. This stood out as the best individual data source from which to conduct robustness checks for a number of reasons, in particular its coverage of [almost] the entire time period of interestFootnote 7 and the availability of data on both corruption and governance effectiveness for all of the countries studied in this paper.

Specifically, EIU data on governance consist of expert responses to survey questions and are subject to peer review for consistency. For governance effectiveness, questions ask about both institutional quality and excessive bureaucracy. For corruption, respondents rate their perception of the presence of corruption among public officials between zero and four. Responses are then weighted to lie between zero and one.

Results of these new regressions are displayed in Table 4. These establish that the correlation between corruption perception and controllability of the political network over time is robust to the data source. With these data, we find a much stronger correlation between the density of the network and the EIU measure of corruption, and in the direction implied by our simulation results. That is, as corruption increases over time, it is coupled with a decreasing density of the political network as incoming influence is removed by individual nodes. In the instrumental variable regression, the effect of corruption when instrumented by controllability has a similar magnitude as earlier, but lacks as much statistical significance (with a p-value of 0.179). While the primary results of this paper hold, this robustness check calls into question the validity of controllability as an instrumental variable for corruption with this specific dependent outcome measure.

Table 4 Estimates using the EIU indicator of corruption among public officials

It is encouraging that estimates between the two regressions had the same sign, but the lack of significance of the estimates could indicate that \(\bar {n}_{D}\) is correlated with the effectiveness of governance outside of its correlation with corruption, making it at best a weak instrument for governance effectiveness (as measured by institutional quality and excessive bureaucracy). This is consistent with the model of Battaglini et al. (2019) which considers the effectiveness of individual legislators as a result of a network with endogenous structure. The lack of significance of the coefficient estimate in this particular estimation does not invalidate its use as an instrumental variable in other estimations for which this potential endogeneity issue is not present. Indeed, this robustness check strongly confirms the primary result of this paper, which is that attitudes toward corruption over time are highly correlated with changes in the controllability of social networks between politicians.

As a final robustness check, we repeat the regression (15) while controlling for time fixed effects and using heteroskedasticity-robust standard errors, clustered at the country level. The new regression models are:

$$\begin{array}{*{20}l} \bar{n}_{Dit} = \beta * Corruption_{it} + FE_{i} + FE_{t} + \epsilon_{it} \end{array} $$
(18)
$$\begin{array}{*{20}l} \bar{\rho}_{sit} = \beta * Corruption_{it} + FE_{i} + FE_{t} + \epsilon_{it} \end{array} $$
(19)

In doing so we repeat the same regressions but allow for fixed effects which could affect the controllability and density of networks in all countries in each year, as well as accounting for possible correlation between the residuals in each country over time when calculating standard errors. Results of these regressions on the EIU data, shown in the second column of Table 5, again support the theoretical results and with a high degree of statistical significance. That is, both controllability and network density show a highly significant correlation with the EIU indicator of corruption among public officials.

Table 5 Time fixed-effects and clustered standard errors

In this estimation the World Bank indicator shows less statistical significance in its effect on controllability, although the estimate is still significant at the conventional ten-percent level. This drop in statistical association could very well be due to the fact that the WGI are comprised of many different surveys, and that which individual surveys used to generate these indicators are not consistent over time (Kaufmann et al. 2011), while the EIU corruption measure specifically measures only perceptions of “corruption among public figures” and from a consistent sample.

When the seven years of observations from Iceland, which are the only data missing from the EIU, were dropped from the WGI data, the estimate again becomes significant at the conventionally high levels, with a p-value of 0.006. Although estimates of the effect of corruption on density do not become significant, the magnitude of the estimated effect is substantially larger relative to its standard errors than in the earlier estimation. The results of these estimations are reported in the fifth and sixth columns of Table 5. This appears to indicate that the controllability of political networks is specifically strongly correlated with corruption among public figures, but less so with the other types of corruption which are measured by the WGI Control of Corruption estimate. For example, the WGI Control of Corruption estimate also includes measures of “petty” corruption and household bribery. Indeed, this conclusion is further supported by the fact that the EIU data are one of the few individual data sources within the WGI aggregation that cover corruption specifically among public figures, across all the countries and approximately the same time frame as our data on network controllability. Thus that the removal of EIU data from certain observations of the WGI estimate would impact that estimate’s correlation with controllability of the political network.

Discussion

We have found a strong and robust negative correlation between corruption among public figures in a region and the controllability of the political networks which form there. In doing this, we shed some light on the ways in which incentives may direct the formation and subsequent structure of social networks. We have proposed a basic theoretical model of member preferences which could explain this correlation. This model is based on the idea that as corruption becomes more acceptable, nodes can rent-seek by manipulating their position in the social network (primarily through their number of incoming links) so that they become more important to any control schematic. This supports the anecdote that as corruption becomes more popular or acceptable, individuals will want a “piece of the action.” Their manipulation of their own position to become more critical for controllability of the network is given the term “hierarchical congestion.” This is due to the effect that as more nodes take actions to raise their position in the hierarchy of controllability, the network system as a whole becomes more difficult to control.

Simulation results support that decreasing their structural in-degree could be a mechanism by which nodes increase their own control capacity, without affecting that of their neighbors, and thus increase the size of the minimum driver set. This suggests that politicians may increase their position in the controllability hierarchy through a heuristic which places additional cost on the forming of incoming links. If this were true, then corruption should be expected to have a negative correlation with the density of networks. Empirical evidence on this phenomenon is mixed. While the estimates all point toward the same central tendency, the effect is not statistically different than null when estimated using the WGI aggregate indicator. In the estimates on the individual data source provided by the EIU, however, they are highly statistically significant at conventional levels.

The use of political network controllability as an instrumental variable for political corruption in a region has been discussed. Importantly, it is crucial to ensure that the dependent variable in an instrumental variables regression is uncorrelated with controllability of the political network, other than through corruption. For example, instrumentation for governance effectiveness may not be possible if the measure to define the “effectiveness” of the legislature is otherwise correlated with controllability of the network. Instrumentation using network controllability may very well be possible for other dependent variables such as educational outcomes or infrastructure quality, provided that they are coupled with a suitable argument regarding a lack of direct correlation between controllability and the outcome variable.

This paper is the first to examine the controllability of a network system in an environment where nodes have value of control input signals and the network formation is endogenous. We have discovered evidence of hierarchical congestion in political influence networks, which are networks where nodes can choose their incoming directed links, but because control capacity is primarily determined by a node’s in-degree there may be wholly different effects of control preference in networks where nodes choose outgoing links, or in undirected networks. Thus a key conclusion of this paper is that controllability of a social network should not be considered in isolation from the incentive structure of the nodes within the network, particularly in situations where the network is growing, forming, or changing over time.