Assessing the aggregate risk emerging in complex systems is of paramount importance in disparate fields, such as economics, finance, epidemiology, infrastructure engineering, etc. A large body of recent literature has explored, both theoretically and empirically, how risk propagates [1] and how to assess aggregate risk when the risk of each individual entity is known [2], as well as the topology of the network of interaction among them. Although both aspects have been shown to be important, their mutual relation is relatively less explored. In theoretical studies, one typically assumes independence between idiosyncratic risk and topology, while in empirical studies the correlation is the one present in the investigated dataset.

But what is the relation (if any) between the idiosyncratic risk of a node and its local topological properties (e.g. degree, centrality, community, etc)? In this paper we answer this question by studying a specific system where the assessment of aggregate risk is particularly important, namely the network of interaction between firms. Assessing the risk of firms is one of the fundamental activities of the credit system. Banks spend a significant amount of resources to scrutinise the balance sheet of firms in order to obtain accurate estimations of their riskiness, the internal rating, and provide credit conditions reflecting both the capability of the firm to repay the loans and its probability of default. The riskiness of a firm depends on many idiosyncratic factors (e.g. balance sheet, structure of management, etc.) as well as the industrial sector or its geographical location [3,4,5]. However, corporate firms do not live in isolation, but interact with each other on a daily basis. The interactions can be of different kinds, including those due to the supply chain, payments, business partnerships, financial contracts, and mutual ownership. The structure of interactions is complex and multifaceted, but its knowledge is critical both for macroeconomists and for the credit and banking industry to understand the dynamics of the economy, the business cycle, the structure of corporate control, and, of course, the risk of firms (in isolation or in aggregation).

Here we study the interplay between the risk of firms and the interlinkages connecting them. The network is built from a large proprietary dataset provided by a major European bank. The dataset contains the payments collected at daily granularity between more than two million Italian firms together with the information on internal risk rating for a large fraction of them. We want to understand whether and in which measure a firm’s role in the network can be informative of its riskiness. This is important for two reasons. First, even if the risk of a firm is not known to all the counterparts, it may affect its ability to interact with other firms. For example, a poor rating (i.e. high riskiness) may prevent the access to credit and as a result it may cause a reduction or delay in payments toward suppliers. If the supplier has high risk, the missing or delayed payment can prevent its own payments, increasing the likelihood of a cascade of missing payments and a propagation of financial distress. The second reason is that, in certain cases, the knowledge of the riskiness of a firm or of a group of firms is lacking or imprecise. In these cases, the existence of a correlation between network properties and risk can allow or improve the assessment of risk. Indeed, in the last part of the paper we will show how network properties of a node can be used to predict the risk of the corresponding firm.

Previous works on networks of firms focussed mainly on ownership relations [6,7,8,9,10], or dealt with the theoretical modelling of other types of relation [11]. Exceptions are the empirical studies on the Japanese economic firm-to-firm network [12, 13], where links represent buyer-supplier relationship. In other cases, as in the seminal paper [14], even if the theoretical framework applies to single firms, the empirical part focuses on the aggregate, sector network, due to lack of more granular data. The use of payments as a proxy of interactions between economic entities is not new and has been investigated mainly for banks [15,16,17,18,19] in the context of systemic risk studies, where, however, other choices to characterise interactions are possible [20,21,22,23]. Apparently much less is known about the payment network between firms, mostly because of lack of data. Concerning rating prediction, there is a vast literature mainly considering the problem as a classification task [24]. The idea of employing machine learning techniques in credit rating scoring has been explored before [25, 26], but in these cases the predictors for the rating are all derived from balance sheets, so the results are not comparable with ours. Other works use more heterogeneous information to predict the rating [27,28,29,30,31].

This paper contributes to these streams of literature in several aspects. First, we investigate the topological properties of payment networks by considering standard network metrics, such as degree and strength distribution and components decomposition. We find that the large payment networks investigated in this paper share the properties observed in other complex networks, namely they are sparse but almost entirely made of a single component, they are scale free and small world. Then, we look into the distribution of risk of firms in the network of payments in order to quantify the dependence between the network property of a node or a group of nodes and the risk of the firm represented by the node(s). The main and most innovative contribution of this paper is to document the existence of such correlations. We find an homophily of risk, i.e. the tendency of a firm to interact with firms with similar risk. This is a two nodes property, but a similar behaviour is observed, even more clearly, also at larger aggregation scales. Communities of firms, detected by using different methods, often display a statistically significant abundance of firms of a specific risk class, indicating the tendency of firms with similar rating to be linked together through payments. Risk is therefore not spread uniformly on the network, but rather it is concentrated in specific areas. This implies that an idiosyncratic shock on a single firm can propagate more or less quickly depending on the local network structure and the community the node belongs to. The last contribution, is to exploit this correlation between risk of a firm and network characteristics of the corresponding node to predict the risk rating of the firm using network properties alone. To this end, we employ machine learning techniques to build classifiers for risk rating whose inputs are only network properties (e.g. degree, community, etc.). We show that our classification method has a good performance both in terms of accuracy and of recall and that outperforms significantly random assignments.

1 The network of payments

1.1 The dataset

The investigated dataset contains information on payments between more than two million Italian firms and is built from transactional data of the payment platform of a major European bankFootnote 1 Transactions are registered with daily granularity for the year 2014, for a total of 47M records, each of which includes the two counterparts involved, date, type, amount, and number of transactions in the same day. Transactions are originally identified by account, but in case of customers and former customers, multiple accounts associated to the same firm are aggregated into a single entity.Footnote 2 This results in a total of 2.4M entities (which will be referred to as firms, for brevity) operating through the platform during the whole investigated period. The firms can be of different types: customers, who have an account in the bank, non customer, and former customers. There is also a small residual class on NA, which we aggregated with the non customer class. More information on the frequencies of the different classes is available in Appendix 1.

In principle, any firm or public body can make use of the platform, but in practice in most cases at least one is a customer of the bank. Similar considerations hold for the total amount exchanged: in each month more than 50% of the volume is transferred between customers, and it rises to above 95% when considering transaction with at least one customer involved. More details on the dataset and some descriptive statistics is presented in Appendix 1. Finally, for a large fraction of customers, the dataset contains information on the economic sector and on the internal rating of the firm on a three value scale: Low (L), Medium (M), and High (H) risk.

1.2 Networks definition and basic metrics

A network, or graph, is identified by two sets: V, the sets of nodes with cardinality \(\lvert V \rvert =n\), and E, the sets of links or edges, with cardinality \(\lvert E \rvert =m\). The latter is the collection of ordered pairs of connected nodes. In our case, we also take into account the strength of interactions so a weight \(w_{ij}\) is associated with each link. Starting from transaction data, payment networks are constructed as follows: given a time window, each node represents a firm active in that period; if there is payment between two firms a link from the source to the recipient is added, with weight equal to the payment amount. If multiple transactions occur between the same (ordered) pair of nodes, the weight of the link is the sum of the amounts of the payments. Therefore for each time period we construct a directed and weighted network. The time window of analysis may vary depending on the type of information one wants to extract from the dataset. In the following, the focus will be on monthly networks, for which results are quite stable, at the cost of dealing with fewer and larger graphs. For the period covered by the dataset, each monthly network consists on average of \(n=1\)M nodes and \(m=3.2\)M links with the lowest activity in August and the highest in July (see Appendix A.1). The density \(\rho =\frac{m}{n(n-1)}\) is thus small, resulting in a so called sparse network. Nevertheless this low density does not imply a disaggregated system. Indeed for all the monthly networks the diameter is very small compared to the size: on average across the months, starting from a node one has to pass at most 19 links to reach any other node in the weakly connected component (see Table 1). Thus the networks have the so called small-world property.

Table 1 Basic metrics of the network of payments

1.3 Networks topology

When considering a small number of firms, one would expect simple topologies: one firms is the supplier of intermediate products for another firm, resulting in a line (the simplest supply chain), or one firm is a supplier or a buyer for many others firms, resulting in a star network. Instead what is observed is a much more complex organisation, with a non negligible presence of cycles.

At a very coarse level, it is possible to identify two large classes of firms. The first constitute the core of the network, which includes approximately 20% of the nodes and more than half of the links. This core has a density an order of magnitude larger than that of the whole network and it is characterised by the fact that any pair of firms is connected, directly or via intermediaries. Around 60% of the total volume circulates among the nodes of the core (see Table 5 in Appendix 1). The other class is made of payers-only, i.e nodes that have no incoming links. These represent each month about one half of the active firms and their activity is sporadic. To better understand the role of this significant subset of firms we check their customer status and we find that the majority of them are unclassified in terms of client status, and that their number is larger than one expects from the unconditional distribution among all the firms (see Table 6 in Appendix 1). This means that likely they are not customers and, more importantly, almost no information, for example about risk, is available on them. For further details on this refer to Tables 3 and 4 in Appendix 1.

We now turn our attention to the distribution of degree and strength. In our case the in- (out-) degree is the number of payers (payees) of a given firm and the corresponding amount of Euro. For the monthly aggregation case the average in- and out-degree of a firm is 6 and 4, respectively (see Table 1). These low values are a direct consequence of the low density of the network. However the degrees and the strengths are extremely heterogeneous as testified by the degree and strength distribution.

Figure 1 shows the empirical cumulative distribution for these two quantities in a double logarithmic scale. The approximately straight line indicates the presence of a fat tail with a power law behaviour. The fit of the exponent supports the observation that in- and out- degree distribution data are consistent with a power-law tail and the estimated exponents are around 2.6 and 2.8, respectively. Similarly, in-strength and out-strength are well fitted by power-law distributions of exponents around 2.1 and 2, respectively. Despite the fact that a large fraction of nodes is different in each month, the tail exponents are remarkably stable (see Table 7 of Appendix A.3).

Figure 1
figure 1

Empirical complementary cumulative degree (left) and strength (right) distributions and their power law fit. The scale is logarithmic for both axes. Data refers to January

The scale free behaviour is quite ubiquitous in complex networks has been found in many other real economic and financial networks [12, 32,33,34,35,36,37]. The fat-tailed distribution for the degree has two interesting consequences: first, there is no characteristic scale for the average degree or strength; second, there are a few nodes that act as hubs for the system, in the sense that, having a large amount of connections, many pairs of nodes are connected through them. This partially explains the low values for the diameter.

Finally, we measure the tendency of firms to be connected to firms which are similar with respect to some attribute, namely the number and the total volume of connections (i.e. degree and strength). Following [38], we compute the assortativity coefficient for a categorical variable,

$$ r=\frac{\sum_{i} e_{ii}-a_{i}b_{i}}{1-\sum_{i} a_{i}b_{i}}, $$
(1)

where \(e_{ij}\) is the fraction of edges connecting vertices of type i and j, \(a_{i} = \sum_{j} e_{ij}\) and \(b_{j} = \sum_{i} e_{ij}\). It is \(r_{\mathrm{max}} = 1\) for perfect mixing, while when the network is perfectly disassortative (each node connects to a node of a different type) it is \(r_{\mathrm{min}}=-\frac{\sum_{i}a_{i}b_{i}}{1-\sum_{i} a _{i}b_{i}}\). Using the number of connections as categorical variable, an high value for the assortativity coefficient indicates that highly connected firms tend to interact significantly more than average with other highly connected firms. Similar reasoning holds using the volume exchanged as categorical variable. Beside the entire graph, we also consider the subgraph of firms with rating and the subgraph of customers.

The assortativity coefficient is consistently slightly negative for both attributes, for all months and graphs, namely around −0.03 for the entire graph and the subgraph of firms with rating, and −0.04 for the subgraph of customers, with no strong differences among months and attributes. Table 8 of Appendix 1 reports the summary of values of the assortativity coefficient for each month. A possible explanation for the low assortativity can be that large, very interconnected firms are connected to many subsidiaries which in turn do not engage with many other firms, being their business almost exclusively focussed on the relationship with the large and central firms.

Summarising, each month the payment network of firms is very sparse but almost entirely connected. Half of the firms appear in the network as payers only (no incoming links) and they are mainly unclassified with respect to customer status, so no much information is available on them. Of the remaining nodes, almost half constitutes the denser core of the network where more than a half of the transactions occur and above 60% of the volume circulates. Finally, the network is small world, scale free, and slightly disassortative both for degree and for strength.

Even if we cannot directly compare the topological properties of our network with other similar ones, we can take as point of comparison other firm-to-firm networks commonly used in the literature. The corporate control/ownership networks display typically some similarity with ours, for example sparsity [8, 10], a power law degree distribution [7, 10] with the presence of hubs [10], small diameter [6], and bow tie structure [8].

2 Risk distribution and network topology

In this Section we investigate the distribution of risk of firms in the network of payments. We are interested in measuring the dependence between the network property of a node or a group of nodes and the risk of the firm represented by the node(s). We proceed in a bottom-up fashion, zooming out from single nodes to subsets. At first we consider a firm’s local property (the number of connections) and we check if it correlates with the risk. Then we consider pairs of linked firms and measure the homophily in risk, i.e. whether firms with similar risk profile tend to do business together and thus to be linked. Finally, we divide firms into subsets induced by the network structure and we check whether the inferred subsets are informative with respect to the riskiness of the composing firms. Specifically, we partition the network in groups (or communities) of firms by using only network information, and we test if the distribution of risk within each group is statistically different from the global one. Thus the goal is to understand if the inferred communities are homogeneous with respect to the risk profile of the composing firms: a community with many firms with high risk rating is a clear indication of financial fragility and a possible source of instability, since the distress of one or few firms of the community is likely to propagate to the other firms.

For the sake of brevity, in the following the analysis is presented for one month, but results are consistent for all the months, and the complete results are reported in Appendix 2.

2.1 Degree and risk

The first investigation is on the relation between the degree of a firm and its risk. The probability for each risk level \(r\in {L,M,H}\) conditional to the out-degree is computedFootnote 3 and plotted against the degree. The results are shown in Fig. 2. We notice an interesting correlation between degree and risk: small degree nodes are more likely medium risk firms, whereas large degree nodes are more likely low risk firms. The high risk firms are more evenly spread across degrees, even if a larger fraction is observed for low degree nodes. To assess if the three curves are statistically different we perform a multinomial logistic regression on data [39] (the solid lines in the plot). This choice is justified by the fact the quantities just described are the probabilities of outcomes in a multi-class problem given an independent variable (the degree). The estimated probabilities follow quite closely the trend of the empirical distribution and the coefficients are all significant. More detailed results of the fit are given in Table 9 of Appendix B.4 (first two columns).

Figure 2
figure 2

Probability of rating of a firm conditional to its out-degree. The solid lines show the fitted multinomial logistic distribution, with its confidence intervals (dashed lines) in matching colours

The correlation just highlighted can, at least in part, be influenced by the effect of the size of the firm (in term of assets value from the balance sheet): a large firm is usually considered less risky than a small one. At the same time, a larger size generally implies a higher number of connections, as seen for example in the interbank network [18]. As the size of firms is not available to us, we use the sum of the incoming and outgoing amounts as proxy. Defined in this way, the size has a Spearman rank correlation of 0.67, 0.57 with in- and out- degree, respectively. To control for the effect of the size, we repeat the same procedure on subsets of firms, grouping according to their size into tertiles. We repeat the multinomial logistic regression adding the size tertiles among the predictors, and we still obtain statistically significant coefficients (last four column in Table 9 of Appendix B.4).

Similarly, the three conditional degree distributions given the rating result statistically different, as for every month all pairs reject the null hypothesis in the 2-sample Kolmogorov–Smirnov test [40]. Therefore topological characteristics (the degree) of the node can be used to obtain information on the riskiness of the corresponding firm, even when controlling for size. From a risk management perspective this is an important results, since on average highly connected nodes are also less risky.

2.2 Assortative mixing of risk

The next step is to check whether risk is correlated with direct connection preferences. To clarify this point, we consider two features: the assortative mixing of the risk and the conditional distribution of rating given the distance.

In the first case we compute a weighted variant of the assortativity coefficient in Eq. (1) using as categorical variable the risk rating. When the rating is not available, we assign the node to a residual class.Footnote 4 In practice, the quantities \(e_{ij}\) are substituted by \(\tilde{e}_{ij}\), the fraction of volume from nodes of type i to nodes of type j. The reason for this choice is to mitigate the impact of the aforementioned large number of uncategorised payers. In most cases their links are associated with low volume and few transactions. Also, customer firms, even if they represent only around \(1/3\) of the firms, exhibit a generally more intense activity, both in terms of number of transactions and of volume, hence accounting for the stronger ties between the firms.

The assortativity metric is positive for all the three graphs, 0.070, 0.157, 0.163 for the whole set, the nodes with rating, and the customers, respectively, with significant variability across the months but always positive sign.Footnote 5 Table 11 of Appendix B.5 reports the summary of values of the assortativity coefficients for each month.

With the same quantities \(\tilde{e}_{ij}\) we define metrics to assess different preferences in connection between incoming and outgoing payments. We test if firms are more concerned with the risk of payers than of the payees by testing for different risk distribution between incoming and outgoing connection. To discriminate between these two cases, for each node i we compute the percentage excess of volume with respect to the average toward nodes in certain risk class and we group according the rating of the node. The distributions are compared using Mann–Whitney U test [41]. This non-parametric test allow to assess if one distribution is stochastically greater than the other. Details on the metrics and the test performed are given in Appendix B.5. We find that it is likely that firms are, at least in part, aware of the riskiness of their counterparts and results suggest they use this information in choosing their business partners. However the hypothesis that incoming payments show a more marked preference for low risk is not supported by data. Moreover the overall positive assortativity is mainly due to low risk nodes. This suggests that low risk firms are more careful in the choice of their business counterparts, possibly also because their relative larger creditworthiness allow them to find available partners more easily.

The quantities considered so far in this section are pairwise comparisons between the rating of nearest neighbours, and give an aggregate measure. A possibleFootnote 6 way to enrich this information is to consider the conditional distribution of rating for nodes at a given distanceFootnote 7 and to compare it to the unconditional distribution. In the case of no influence of the rating on the connection pattern, the conditional distribution of risk given the distance should be statistically indistinguishable from the null unconditional distribution. To test if this is the case, we first compute the distance between all the nodes for which the rating is available. Then for any fixed k, the occurrences of ratings are computed by looking at the set of pairs at distance k. Finally, the estimated distributions are tested against the null one with an hypergeometric test, as explained in details in Appendix B.6.

Results for April are summarised in Fig. 3, which considers the case when the source node is in class L (for the others rating and months see Table 12 and Fig. 9 in Appendix B.4). Results are similar when considering a medium or high risk source. For each k a marker indicates the percentage of nodes with low (green circles), medium (yellow squares) or high (red diamonds) risk at distance k. A marker is full when the percentage is statistically different from the null distribution (the dashed lines, with matching colours).

Figure 3
figure 3

Distribution of ratings for nodes at distance k from a node with rating L. The dashed lines are the unconditional (null) distribution of ratings among nodes in the entire sample. A full marker indicates that the over or under representation with respect to the null distribution is statistically significant in the hypergeometric test at 1% significance level with Bonferroni correction

We note that up to distance 5 the class of low risk firms is significantly over-represented in the distributions. At greater distances, medium and high risk groups are over- represented. This means that more steps in the networks are necessary to reach riskier firms. This fact is particularly interesting when considering that each firm is in theory unaware of others firms’ ratings and in some cases even its own.

When considering the same quantities for incoming paths, results (see Appendix B.5, Fig. 9 right panels) are very similar, namely at short distances the low risk class is over-represented, while medium and high risk nodes are over-represented for longer distances.

A possible explanation for these observations is that among the hubs of the systems (i.e the most connected nodes) firms with rating L (i.e the most creditworthy) constitute the vast majority. This holds true when considering both in-coming and out-going links, and including also the nodes with no rating. Moreover, they are in the denser core previously described, while many high risk firms have a few or no out-going links and they are peripheral in network.

2.3 Network organisation and risk

In this Section we study the relation between the organisation of the network at a more aggregate level and the distribution of risk. We are interested in two types of organisation of networks into groups. The first is the modular organisation: each module is composed by nodes, which are much more connected among themselves than with the rest of the network. In economic terms, modules could represent, for example, firms operating in the same region or area, and the high density of the module reflects the fact that payments are more frequent with geographically close firms. We saw before that the network shows an assortative tendency with respect to risk, so we want to test if the homophily on risk can be observed beyond the pairwise relationship.

The second is a hierarchical organisation. Since the payment network is directed, we look for a ranked partition (i.e. each group of nodes is labelled with an integer from 1 to the number of groups M) such that most links are from nodes in low rank classes to nodes in high rank classes. This type of organisation could represent, for example, a supply chain and the flow of payments between the firms of a group and those in the group in the next rank class reflects the (opposite) flow of goods or services. This classification is important because a high risk concentration in low class nodes of a strongly hierarchical network can trigger a cascade of distress in the higher rank classes.

Modularity and hierarchy are conceptually opposite as the first penalises connections towards other groups, which instead are encouraged in the latter (provided that they go from low rank to high rank nodes).

For each metric, we proceed in the following way:

  1. i.

    we find the optimal partition according to the criterion;

  2. ii.

    we compute the distribution of ratings within each subset of the partition;

  3. iii.

    we test whether such local distribution is statistically different from the overall distribution of ratings by employing the hypergeometric test used in the previous Section and described in Appendix B.6. In order to have a large enough sample for testing, we only consider subsets with at least 500 known ratings.

We showed so far that the structure of the payments network is very complex. Since our goal is to obtain information on the risk of the firms, it can be helpful to filter the network before performing communities detection, in order to keep the most relevant connections. Thus we focus on the subgraph of customers. The reasons for this choice are many. First, the percentage of nodes with rating active every month is quite low, around 20%, but it raises to 70% when considering only the customers (see Table 3 in Appendix A.1 for a summary). This will help having a more informative local distribution of risk when considering subsets of nodes. Secondly, more than a half of the volume is transferred between customers (see Table 4 in Appendix A.1), so even if a large fraction of transactions is dropped, we are mostly pruning weak connections, while keeping the strongest ones. Finally, as it has been shown in the previous Subsection about assortativity, considering the entire network can be misleading, especially when looking at the connections without considering the weights, as it will be necessary for some metrics.

2.3.1 Modular structure

One of the standard methods for inferring a modular structure in a network is via modularity maximisation. This method divides nodes into subsets, called modules, such that nodes are well connected with other nodes in the same module and there is a smaller number of links with nodes in other modules. Given a partition P in modules C, the modularity is

$$ Q=\frac{1}{2m}\sum_{C\in P}\sum _{i,j\in C} \biggl(A_{ij}-\frac{k^{\mathrm{in}} _{i}k^{\mathrm{out}}_{j}}{2m} \biggr), $$
(2)

where \(A_{ij}\) is the \((i,j)\) element of the adjacency matrix and \(k^{\mathrm{in}}_{i}\) (\(k^{\mathrm{out}}_{i}\)) is the in- (out-) degree of node i. The optimal partition is the one which maximises modularity. Despite the associated optimisation problem is NP-Hard, fast and reliable heuristics for an approximate solution exist, and here the well known Louvain method [42] is employed.

In each month we find that the optimal partition has around 2000 modules. These are quite heterogeneous in size: for example, the 13 largest ones cover more than 95% of the nodes of the network. We perform the hypergeometric test of the null hypothesis of an homogeneous distribution of risk. This hypothesis assumes as null distribution of risk the one empirically observed across the entire network (see Appendix B.6 for more details). We perform the analysis in each module with at least 500 known ratings, amounting to around 19 modules per month. (see Table 14 in Appendix B.6 for more details). These are clearly very large modules, but a significant number of them shows an over or under-expression of one or two risk classes.

For some specific module it is possible to draw statistical robust conclusions on its risk profile. The top panel of Fig. 4 shows the over- or under-representation for the largest modules of January. The seventh module, for example, has an over-representation of firms with low risk and an under-representation of the other two risk profiles, thus it represents a group of firms with small risk. On the contrary the eighth module has an over-representation of highly risky firms and under-representation of low risk firms, representing a possible warning for the bank.

Figure 4
figure 4

Distribution of ratings in the three partitions, modularity (top), hierarchy (bottom). The dashed lines are the unconditional (null) distribution of ratings among nodes in the entire sample. A full marker indicates that the over (above the dashed line) or under (below the dashed line) representation with respect to the null distribution is statistically significant in the hypergeometric test at 1% significance level with Bonferroni correction

2.3.2 Hierarchical organisation

We now consider explicitly the directed nature of the payment graph and the hierarchical organisation of the network. An ordered partition is such that each subset is associated with an integer number (rank) \(r\in \{1,\ldots,M\}\). A graph has a hierarchical organisation if nodes are more likely linked to other nodes with a higher rank [43], such as in military organisations or in administrative staff. Finding the optimal ordered partition and revealing the hierarchy of a graph is in general complex and requires the minimisation of a suitable cost function, similarly to what is done with modularity.

In this paper we use a cost function proposed in [44]. Given a rank function \(r:V\to \{1,\ldots,M\}\), the cost function penalises links from a high rank node to a low rank node. The penalisation is a linear function of the difference between the ranks. Thus the optimal hierarchical partition is obtained by solving the optimisation problem

$$ A^{*}=\min_{r\in \mathcal{R}}\sum_{(u,v)\in E} f \bigl(r(u)-r(v) \bigr) , $$

where \(\mathcal{R}\) denotes the set of all ordered partitions and the cost function is

$$ f(x)= \textstyle\begin{cases} x+1, & x\geq 0, \\ 0, &x< 0. \end{cases} $$

The hierarchy of the graph is defined by

$$ h^{*}(G)=1-\frac{A^{*}}{m} . $$

By definition, \(h\in [0,1]\), and 0 is the value for the trivial partition with only one set, while \(h=1\) is obtained when the network is a Directed Acyclical Graph and it signals a perfect hierarchy. The linear choice of the penalisation function is convenient because the associated optimisation is solvable in polynomial time and few exact algorithms exist [44, 45], while non-linear forms can lead to NP-hard problem.

We apply the hierarchy detection to the monthly networks of payments and the results are summarised in Table 15 of Appendix B.6. First of all we notice that the number of inferred classes, roughly 18, is much lower than in the modular case. Moreover the size of the classes is much more homogeneous. The value of h is also quite stable, around 0.75, indicating a strong hierarchical structure, a remarkable result considering that we are studying only the customers network.

We now consider the distribution of risk in each class and we study the over- or under-expression of certain levels of risk as a function of the rank of the class in the inferred hierarchy. The test rejects the null hypothesis of homogeneous risk distribution, the same used in the modular case, a considerable number of times. As displayed in the bottom panel of Fig. 4, low rank classes have an over-expression of high and medium risk firms, while middle and low rank classes (i.e. \(r\in [8,12]\)) have an over-expression of low risk firms and an under-expression of medium and high risk firms. More details on the test results are given in Table 15 in Appendix B.6. This empirical evidence may signal the presence of paths of risk propagation, since low rank firms, typically riskier, are payers of high rank firms, which are instead less risky.

2.4 Discussion

Both investigated partitions give interesting insights on the relationship between risk and network structure. On one side, the percentage of rejected tests in the case of modularity partition is consistent with the observed assortativity of risk. It may be noticed that the preference for low risk business partners is not always a realistic option, because in some sectors business partners are not replaceable for a variety of reasons. To better assess this point, one possibility could be to include the comparison between modules and geographical location of firms, which is not available to us. On the other side, the relation between risk and hierarchical partition is probably related to the peculiar conditional distribution of risk with respect to the distance described in Sect. 2.2. Indeed, given the fact the high risk nodes are over-represented for longer distances, they should be located in extreme positions in the ranking, either at the top or at the bottom, and this is what is observed. It must be stressed that in the case of the two methods chosen here, one does not exclude the other, as they give different and complementary standpoints for interpretation. In this sense a multi-dimensional perspective is needed, where the dimensions are the mechanisms that either favour or discourage the creation of business relationships.

3 Missing rating prediction using payments network data

In the previous sections we showed that network metrics can be informative of the risk of a firm. It is therefore natural to ask whether it is possible to predict the missing risk rating of a firm by using only information on network characteristics of the corresponding node, as well as risk rating of the neighbour firms. This problem is particularly relevant since we noticed that around 30% of the customers in the dataset do not have a rating and this percentage is even higher when the entire dataset is considered (see Table 3 in Appendix A.1).

Here we use network characteristics as predictors for the missing ratings into well known methods of machine learning for classification problem. The predictors we employ are the following:

  1. i.

    in- and out-degree;

  2. ii.

    weighted fraction of (in- and out-) neighbours with a given rating (H, M, L or NA)

  3. iii.

    rank of the class in the hierarchy inferred by agony minimisation;

  4. iv.

    membership in community inferred by modularity maximisation;

  5. v.

    sum of in- and out-strength.

The fractions in (ii.) are computed considering the amount (weight) of each payment and are together a measure for rating assortativity, while (v.) is a proxy for the size. Data are preprocessed following [24] so that variables are comparable in order of magnitude, as detailed in Appendix C.7. These transformations result into a total of 25 predictors. The dataset is the one which includes only the customers, and we consider the monthly network for January (see below for the other months). In order to assess the performance of the prediction, we train each model using 75% of the data, and the remaining 25% is used for testing.

We consider three methods for classification:

  1. i.

    multinomial logistic;

  2. ii.

    classification trees;

  3. iii.

    neural networks.

See [24] for a review of these methods.

The class H is under-represented in the sample, as it includes only around 10% of the firms with rating. This affects the ability of any classifiers to recover this class. This is undesirable, since the class H the most critical for the riskiness.

To address this issue we proceed with a 2-step classification strategy for all the three methods. The intuition behind this strategy is to train a classifier more specialised in the recovery of one specific class at the first step, and then separate the remaining classes in the second step. In the first step we fix a risk class, say L, and we merge the other two classes into a fictitious class X. We fit a first instance of the chosen model on the modified database. In the second step, we train another instance of the model only on the two previously merged classes. This is repeated for all the three risk classes. In the case of class H being the one selected for step one, we apply SMOTE [46] before training, a well-known algorithm for data rebalancing.Footnote 8

Once the models are trained, the prediction are obtained by iterating the following two steps for each risk class (see the schematic representation in Fig. 5)

  1. i.

    apply the first step classifier;

  2. ii.

    if the entry is classified as X, apply the second step classifier.

Figure 5
figure 5

Schematic representation of the 2-steps classifier

The final prediction is the median of the predictions. In case of draw, more weight is given when the class is obtained from the first instance (as the classifier is more specialised). For the 2-steps method, the random classifier can be defined in the following way: the null distribution for the first step is obtained for each classifier, by taking into account the fictitious class, and at the second step by considering only the two classes previously merged.

Table 2 shows the results for each classifier, together with the value for the same metrics computed for the random classification. In the case of classification trees and neural networks, different combinations for the hyper-parameters have been tested (such as depth for the trees, and number and size of hidden layers for neural networks), here we present the results for the best choice for each model, and in the Appendix C.8 we explain the selecting procedure.

Table 2 Accuracy and recall for 2-steps classifiers. R: random, ML: multinomial logistic, CT: classification tree, NN: neural network

We repeat the procedure also for the other months, using only one hyper-parameters choice for each type (the one resulting from the tests on the first month), see Table 16 for details of the results.

The three models behave quite similarly, with slightly better overall performance of neural networks, and the training times are comparable.

It is interesting to study which of the network features are more predictive of the risk. While this is a complicated task for neural networks, it can be performed for classification trees. Figure 6 shows the importance of the predictors in the classification trees. As the 2-steps method includes 6 classification trees, we evaluate the importance of features for each classifiers (bars) and then also compute the average (line). We repeat the same for each month and present the average and standard deviation. We observe a good agreement across months, but interestingly less across classifiers in the ensemble (see for example the importance of in-degree for step 2 for L classifier with respect to the other classifiers).

Figure 6
figure 6

The ten most important features for the classification tree. Each bar represents the importance for a single classifier as detailed in Fig. 5). The pink line is the average across all classifiers. Results are averaged across months and the black bar indicate the standard deviation. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature (also known as the Gini importance)

From the figure we conclude that the most important features are: (i) in- and out- degree, (ii) percentage of neighbours with rating L or H, (iii) the proxy of the size, and (iv) the position in the hierarchy. Interestingly, the community to which the node belongs, according to the modularity partition, seems to play a minor role.

It must be noted that, among the predictors only network deduced metrics have been included, while any data from the balance sheet, which is likely to represent the main source for the risk rating model, as well as the sector or geographic location, are excluded. When adding the economic sector, which is the only metadata available to us, as further predictor the prediction power only slightly improves to from 49%–50% to around 52% of accuracy for both classification trees and neural networks. The natural benchmark models are the random classifiers, both 1-step and 2-steps, due to the total lack of data employed in the proprietary rating model. We are able to outperform the first by 30% to 38%, and the latter by 15% to 22% in term of accuracy, and especially in the case of neural network, we are able to find a good compromise with recall for H.

4 Conclusions

In this paper we empirically study the interactions and the risk distribution of 2 million Italian firms, via the investigation of payments networks built from transactional data.

Our contribution is threefold. On one side, the empirical study of the relationship between the high number of firms to our knowledge has not been done before, especially with this granularity. The study of the structure of the network highlights a complex interdependence between firms; indeed particularly interesting is the presence of a relatively small core of firms, which are involved in most transactions. This feature, paired with the power-law tail distribution of the number of connections and the total volume exchanged by the firms, can be a symptom of an architecture which favours the spread of distress, or positive feedbacks. Also relevant is the observed tendency of large, well-connected firms to be connected to small (in terms of exchanged volume), poorly connected firms. This can be the result of almost exclusive relationships between a big producer and its subsidiaries.

The second and main contribution is the assessment of the correlation between the network structure and the distribution of risk. From our analysis, we conclude that the risk level of a firm is correlated to its features and role in the network at different levels. For single firms, we observed that low risk firms are more likely to have a high number of connections, and some of them acts as hubs for the entire network, being connected to thousands of other firms. When pairs of linked firms are considered, we observed the tendency to favour connections towards firms with the same risk level. This tendency can be observed also on a more aggregate level. Indeed, we found that also groups of firms which are more connected among them than with the rest of the network, have a local distribution of risk which is statistically different from the global one, meaning that some risk classes are over- or under- represented. Finally, we divided firms into a hierarchical organisation, in such a way to highlight the main direction along which money circulates. This simplified structure showed once more that many levels of the hierarchy have a local distribution of risk statistically different from the global one. As high risk firms are over-represented at the beginning of the flow of money, this can be a source of distress for the entire system.

Finally, we showed that network metrics and community structure can be successfully used to predict the missing ratings with machine learning models. We propose a simple 2-steps strategy to compromise between overall accuracy and recall on the smallest but riskier class. We test our strategy with three methods, namely multinomial logistic, classification trees and neural networks. Since predictors are all network-derived quantities, and no information from balance sheets or other meta-data are used, the random rating assignment is the natural benchmark. We find that all the three methods are able to outperform significantly the benchmark, with slightly better results for neural networks.