Introduction

Financial markets are a critical part of modern economies and help us to understand the current consensus on how various constituent companies are performing. Asset prices are an equilibrium reached by the interactions between external information, historic performance and trader sentiments. Because of this, they are widely regarded as incredibly complex systems. Networks are a popular method of representing a complex system due to their ability to express interactions or relationships between components in a simple model that is widely applicable to many different fields. Networks inferred from these disparate fields share surprisingly common properties, including heavy tailed degree distributions, high clustering coefficients and community structure (Barabási 2016; 2003). Once constructed, the networks can be used for a wide variety of applications, for instance portfolio selection (Pozzi et al. 2013; Peralta and Zareei 2016), stability analysis (Huang et al. 2009) and to detect attempts to manipulate stock prices (Sun et al. 2011; Shi et al. 2019).

A common and popular way of constructing a financial network is to infer it from returns data following the seminal work of Mantenga (1999). In this paper the author uses Pearson’s correlation coefficient between the returns to weight the edge between two companies and constructs a minimum spanning tree from the resulting graph, finding companies in similar sectors are clustered in the tree. Onella et al. (2003) further investigate this, studying the effects of market crashes on the minimum spanning tree. They find the length of the tree decreases in times of market disruption (i.e. things become more correlated). Boginski et al. (2005) create a network from the correlation matrix inferred from financial returns data and investigate what happens to the structure of the network when a threshold is set. Any correlation coefficients with an absolute value below the threshold are set to zero and any above are set to one. This model is applied to the US stock market, and they find that the degree distribution of the networks resembles a power law if the threshold is set to a sufficiently large value. Furthermore they also construct maximal independent sets from the network in an attempt to create diversified portfolios. Other authors have applied this model to the Russian (Nagurney 2003), Chinese (Huang et al. 2009) and British (Chu and Nadarajah 2017) stock markets. These models have also been applied to other assets, including cryptocurrencies (Song et al. 2019).

An important question is whether these correlation networks contain real information or whether they are merely picking up noise present at that moment of time. Plerou et al. (2002) use random matrix theory to analyze correlation matrices inferred from stock returns. By studying the eigenvalues and eigenvectors of the correlation matrix they obtain several interesting results. Firstly, only a few of the eigenvalues and eigenvectors contained in the correlation matrix differ significantly from those obtained from a random matrix. This implies many of the relationships in the correlation matrix are noise. However the largest eigenvalue tends to differ significantly, being an order of magnitude larger than the second largest. The eigenvector that corresponds to this eigenvalue tends to contain information that affects all stocks (for instance interest rate increases), with its components being significantly different from those obtained from a random matrix. They also study the next few eigenvectors corresponding to the next largest eigenvalues and find these tend to have significant values for related stocks - for instance those in the same sector or who have business in similar regions. Finally they show the eigenvector that corresponds to the largest eigenvalue tends to be preserved for correlation matrices inferred from data across different times, indicating it is stable.

While using the correlation coefficient does give a simple, interpretable model there are downsides. Two variables which share a common cause can be correlated, which could be considered a false relationship. We can use partial correlation in this context to attempt to remove these indirect correlations. Kennet et al. (2010) create a form of partial correlation network they term a ‘dependency network’. Calculating partial correlations by removing one variable at a time, they subtract this off the correlation coefficient between two variables. This gives the contribution that a particular company has on the correlation between two others. These networks are directed and help us to understand which companies are influential in the market. They find that the financial sector is very influential in the US stock markets and its influence is maintained throughout the period of study.

Another example of the use of partial correlation is used by Wang et al. (2018) who compare minimum spanning trees constructed using the correlation and partial correlation coefficients inferred from various stock indices across the world. They calculate partial correlation by inverting the correlation matrix and find that the centrality structure in the minimum spanning tree constructed from the partial correlation matrix is more useful than that constructed from the correlation matrix, with the USA, Germany and Japan clearly serving as hubs. Other authors have used sparse precision matrix estimators to infer these networks, for instance the graphical lasso (Millington and Niranjan 2017; Peralta 2015) and SPACE (Millington and Niranjan 2019). These sparse methods do however tend to produce quite unstable networks, possibly due to the instability of the lasso with highly correlated data (Wainwright 2009; Preis et al. 2012).

A covariance matrix can be used to estimate both correlation and partial correlation matrices. We can obtain the correlation matrix (C) from the covariance matrix (Σ) by normalizing the off-diagonal entries:

$$ C_{ij} = \frac{\Sigma_{ij}}{\sqrt{\Sigma_{ii} \Sigma_{jj}}} $$
(1)

Obtaining the partial correlation matrix can be done in a similar manner. We scale the off diagonal entries of the inverse of the covariance matrix (Θ), also called the precision matrix, to acquire the partial correlation matrix (P)

$$ P_{ij} = -\frac{\Theta_{ij}}{\sqrt{\Theta_{ii} \Theta_{jj}}} $$
(2)

Calculating the partial correlation matrix therefore requires that the covariance matrix is invertible. This is not always the case with financial data due to the presence of outliers and the issue of having more dimensions than samples. To solve these issues we use the Ledoit-Wolf Covariance shrinkage method (Ledoit and Wolf 2004). This produces a well-regularized covariance matrix by combining the sample covariance matrix with the identity to reduce the off-diagonal values. This well formed covariance matrix can then be inverted to obtain the precision matrix. With these regularized covariance and precision matrices, we can use the equations above to obtain the correlation and partial correlation networks. Once obtained, we can then compare and contrast said networks. The shrinkage method is described further in “Ledoit-Wolf covariance” section.

We hope that by using partial correlation we can discover latent relationships that are hidden in correlation networks by the overall movement of the market. Previous work has focused on removing the market mode from the data by various methods, including deleting the largest eigenvalue and eigenvector, or using a factor model (Namaki et al. 2011a) but to our knowledge a partial correlation approach like the one we take has not been used on financial data before. In our approach, the partial correlation between two variables is the correlation between the two once the linear effects of all other variables have been removed. To gain further insight, first consider that we can infer row i of a precision matrix from a dataset X via regressing variable i (\(\vec {x_{i}}\)) on the others (Xi)

$$ \vec{\beta}^{*} = \arg \min_{\vec{\beta}} ||\vec{x_{i}} - X_{-i} \vec{\beta}||_{2}^{2} $$
(3)

The solution to this is

$$ \beta_{ij}^{*} = - \frac{\Theta_{ij}}{\Theta_{ii}} = P_{ij} \sqrt{\frac{\Theta_{jj}}{\Theta_{ii}}} $$
(4)

This means that the partial correlation between i and j (Pij) is proportional to the weight that the least squares method would assign to j in a regression problem if we tried to predict i from the rest of the dataset (or since Pij=Pji the weight that the least squares method would assign to i if we tried to predict j from the dataset)

Furthermore the precision matrix is intimately connected with the minimum variance portfolio. The problem, formulated as (Markowitz 1952)

$$ \begin{aligned} & \underset{\vec{w}}{\text{minimize}} & &\vec{w}^{T} \Sigma \vec{w} \\ & \text{subject to} & & \vec{1}^{T} \vec{w} = 1 \\ \end{aligned} $$
(5)

has a solution of for one corner of the optimal mean-variance returns frontier

$$ \vec{w}^{*} = \frac{1}{\vec{1}^{T} \Theta \vec{1}} \Theta \vec{1} $$
(6)

where \(\vec {1}\) is a vector of all 1s and wi is the amount to be invested in asset i. Due to the relationship between the precision matrix (Θ) and partial correlation (P) (see Eq. 2) we expect these networks will give insight into the benefits and drawbacks of these portfolios, however it is important to remember a partial correlation matrix discards the diagonal of the precision matrix

Ledoit-Wolf covariance

Ledoit-Wolf covariance is based upon shrinkage, where we combine the sample covariance matrix S (which may have a high variance but low bias) with a known matrix with desirable properties (which has a low variance but high bias). The usual choice for this is the identity matrix I and we create a linear combination of the two

$$ \Sigma_{\text{lw}} = (1 - \rho) S + \rho \text{tr}(S) I $$
(7)

To decide ρ we wish to minimize the Frobenius norm of the difference between Σlw and the true population covariance matrix Σ

$$ \min_{\rho} E[||\Sigma_{*} - \Sigma_{\text{lw}}||_{F}^{2}] $$
(8)

The optimal solution of ρ is

$$ \rho = \frac{E[||S - \Sigma_{*}||_{F}^{2}]}{E[||S - \text{tr}(S) I||_{F}^{2}]} = \frac{\beta^{2}}{\delta^{2}} $$
(9)

The interpretation here is that if S is very close to Σ (i.e. our estimate of the covariance is good) then we do not need to shrink much, or if our shrinkage choice does not seem accurate then we should not shrink much either. However the obvious flaw so far is that we need to know the true population covariance matrix to obtain the correct value for ρ - and if we did then we would not need to bother estimating it to begin with! We therefore require estimates of β2 and δ2. We can estimate δ2 as following:

$$ \hat{\delta^{2}} = ||S - \text{tr}(S) I||^{2}_{F} $$
(10)

and β2 as

$$ \hat{\gamma^{2}} = \frac{1}{n^{2}} \sum^{n}_{k=1} ||x_{k} x_{k}^{T} - S||_{F}^{2} $$
(11)
$$ \hat{\beta^{2}} = \min(\hat{\delta^{2}}, \hat{\gamma^{2}}) $$
(12)
$$ \hat{\rho} = \frac{\hat{\beta^{2}}}{\hat{\delta^{2}}} $$
(13)

The constraint on \(\hat {\beta ^{2}}\) ensures that ρ<1. While it is rarely necessary, it does help stop us accidentally making our estimate less well formed.

The Ledoit-Wolf method is guaranteed to give us a positive-definite invertible matrix, which is critical in this application as we require the inverse of the covariance matrix (the precision matrix) to acquire the partial correlation matrix.

This covariance estimation method has been applied to the genomics field (Schäfer and Strimmer 2005), for portfolio optimization (Ledoit and Wolf 2004) and in the neuroscience field (Brier et al. 2015) but to our knowledge has not actually been applied to create financial networks.

Data and software

For our study we use daily log returns from the S&P500. If there is less than 10% of the data missing for a particular stock we fill it with the price from the previous day, or if the data is missing from the start, from the first day when the stock is traded. If there is more than 10% missing we discard the data for that stock. We use the close price on the day to calculate the return, from 2000-01-03 to 2017-12-05. Overall we have 4510 days of return data for 345 stocks. Since financial data is non stationary we use a window of 300 days and slide along this 30 days at a time to obtain a sample where we can assume the data is stationary, giving us 140 windows overall. The returns inside each window are normalized using the z-score to have a mean of 0 and a standard deviation of 1. While correlation is by definition normalized, this procedure is mostly for the benefit of the shrinkage procedure - normalizing reduces the amount of shrinkage required which allows us to capture more relationships.

Using this dataset we infer a network for each window by using the Ledoit-Wolf shrinkage methods to obtain a covariance matrix and inverting it to obtain a precision matrix. We then scale both of these matrices appropriately using Eqs. 1 (Correlation) and 2 (Partial Correlation) to create the correlation and partial correlation matrices and use these as adjacency matrices to construct the networks. We then study the properties of these networks and how they change over time.

We make use of Python, NumPy and SciPy (Oliphant 2006) for general scripting, pandas (McKinney 2010) for handing the data, sklearn (Pedregosa et al. 2011) for the implementation of the Ledoit Wolf estimation methods, statsmodels (Seabold and Perktold 2010) for some of the statistical analysis, matplotlib (Hunter 2007) for plotting, Networkx (Hagberg et al. 2008) for the network analysis and gephi (Bastian et al. 2009) for some graph visualization.

Results and analysis

Network analysis and sector centrality

Firstly we display the networks constructed on the first window of this data. Since they are dense, we display the edges that correspond to the 1000 largest absolute values from the off-diagonal of the matrix. The correlation network has isolated nodes in this situation, so we only display the largest connected component, but the partial correlation network remains connected. These networks are displayed in Figs. 1 (Correlation) and 2 (Partial Correlation). Both networks show a degree of sector clustering but it is far more prominent in the correlation networks compared to the partial correlation networks. The partial correlation networks also seem to have a more uniform degree distribution than the correlation ones, with less community structure.

Fig. 1
figure 1

Example correlation network inferred from the first window. Only the largest connected component of the network containing the 1000 edges with the largest absolute weights are shown. Nodes are coloured according to sector membership. There is a strong community structure visible, with communities usually made up of companies in the same sector

Fig. 2
figure 2

Example partial correlation network inferred from the first window. Only the 1000 edges with the largest absolute weights are shown. Nodes are coloured according to sector membership. There seems to be less sector clustering in the partial correlation networks and less of a community structure

To begin our analysis we look at the distribution of correlation coefficients to partial correlation coefficients in the network, and the difference in weight between the same edge in the two networks. A histogram of these is shown in Fig. 3, and a scatter plot relating the two is shown in Fig. 4. In general, partial correlation coefficients tend to be smaller than the corresponding correlation values and are more likely to be negative, but it is also clear that the two are related. This is likely to be due to the definition of partial correlation - if it is reducing the value of indirect correlations then we would expect some companies that are supposedly correlated to have these relationship strengths reduced.

Fig. 3
figure 3

Distribution of correlation and partial correlation coefficients over the dataset of 140 windows taken over the 17 year period. We can see that the partial correlation matrix generally has smaller values than the correlation matrix, and that they are more likely to be negative. a Correlation b Partial Correlation

Fig. 4
figure 4

Scatter plot of the correlation coefficient for an edge against that of the partial correlation coefficient for each of the 140 networks in the dataset. The partial correlation coefficients are in general smaller than their corresponding correlation coefficients which is to be expected if the indirect correlations and reduced, however there is still a relationship between the two

Our next goal is to compare and contrast stability of the networks. In a correlation matrix the largest eigenvalue measures the intensity of the correlation present, and the corresponding eigenvector measures the ‘market mode’ and the effect the general market has on that particular company (Plerou et al. 2002; Namaki et al. 2011b). Each entry of this eigenvector can also be used as a measure of centrality. Therefore we can study how this eigenvector changes over time to see if the networks regard the same nodes as important, a proxy for how stable the networks are. To measure this we normalize the eigenvectors so the components to add to 1 and then measure the difference between those from adjacent windows using the L2 norm.

Firstly we look at how the largest eigenvalue varies over time. The results are shown in Fig. 5. From this we can see that the largest eigenvalue of the partial correlation matrix is much smaller and varies relatively little, compared to the largest eigenvalue of the correlation matrix. This implies that the intensity of the partial correlation networks does not change much over the dataset, particularly compared to the correlation networks which have large changes. This perhaps indicates that the market mode has been removed. However if we look at the difference in the eigenvectors we get a slightly different story. From Fig. 6 we can see there is a larger change in the corresponding leading eigenvector of the partial correlation matrix as opposed to the correlation matrix, signifying the partial correlation networks are less stable than the corresponding correlation networks and could indicate as to why minimum risk portfolios tend to require large changes in asset holdings (DeMiguel et al. 2007). Both seem to reflect macroeconomic changes, with the magnitude of the difference varying over time. Interestingly the differences between eigenvectors from adjacent windows drops during periods of disruption.

Fig. 5
figure 5

Largest eigenvalue in the correlation (left) and partial correlation (right) networks. There is a large variation in the largest eigenvalue of the correlation matrix, with it varying from 180 to 40. It noticeably picks up the financial crisis of 2008/2009, where the eigenvalue reaches its maximum. The reverse is true in the partial correlation networks where the largest eigenvalue stays roughly constant (and quite small) showing the network has a consistent intensity, indicating the market state has been removed. a Correlation b Partial Correlation

Fig. 6
figure 6

Change in the normalized leading eigenvectors, measured using the L2 norm, of the correlation and partial correlation matrices. Both have changes that seem to reflect the general market conditions, although the eigenvector from the partial correlation matrix seems to change more, which shows the network is less stable

Next we study the centrality of the sectors in the networks, allowing us to quantify their influence in the economy. We use two measures, degree centrality and eigenvector centrality. Relating these specifically to financial networks, firstly we note that degree centrality is simply the sum of the edges of a node. The weight on the optimal portfolio is also proportional to weight of the edges on a node (see Eq. 6). Secondly eigenvector centrality is calculated using the eigenvector that corresponds to the largest eigenvalue, with its components normalized to sum to 1 in the same manner as above. This largest eigenvector reflects the market mode and the effect the general market has on a particular company.

The presence of negative edges in the networks makes calculating centrality more challenging. Negative edges can result in a node having negative centrality, which does not have an obvious interpretation. We can use the absolute values of the edges to solve this problem but this involves discarding the negative relationships, which are numerous in the partial correlation networks. In our experiments we found relatively little difference between permitting negative edges or using the absolute values of edge weights and so permit the existence of negative edges. We normalize at the end so the sum of all node centralities is 1.

To measure the centrality of a sector we take the mean centrality of all the companies in said sector. We then normalize these mean sector centralities to add to 1 to make comparison easier. To start with we look at the mean degree centrality for the sectors for each network. A graph of this over time is shown in Fig. 7, with the legend shown in Fig. 8. In this graph we can see that the partial correlation networks have a much lower difference and variance in the sector centrality than the correlation networks - each sector has roughly the same centrality and there is little variation over time. The telecommunications sector has relatively few companies, hence why its centrality has far more variance. In the correlation networks we firstly see a much larger variance in the mean centrality of a sector, with the financial sector having the highest mean centrality for the majority of the dataset. Interestingly all the centralities ‘jump’ together during the financial crisis, showing how suddenly the correlations between previously unrelated companies increases due to these macroeconomic effects.

Fig. 7
figure 7

Mean degree centrality for each sector over time. It is noticeable that in the partial correlation networks the difference in centrality is much smaller for each sector than in the correlation networks. We can see the macroeconomic trends in the correlation networks with the centralities jumping together during the crash. The colour legend can be found in Fig. 8. a Correlation b Partial Correlation

Fig. 8
figure 8

Legend for the sector colours for Figs. 7 and 9

Fig. 9
figure 9

Mean eigenvector centrality for each sector over time. These centralities have a much larger variance than the degree centralities, particularly for the partial correlation networks. Here we can see the financial sector is the most central for the majority of the dataset, although the real estate does also become important in both networks. Macroeconomic effects are also much more visible, with the strong change from 2009 - 2011 as all the sector centralities move together. The colour legend can be found in Fig. 7. a Correlation b Partial Correlation

Next we look at eigenvector centrality. A graph of this over time is shown in Fig. 9, with the legend in Fig. 8. In this figure we see very different results to degree centrality for the partial correlation networks, which show a much larger variation in the centrality of the sectors. They also show a slightly larger variance in the mean centrality when compared with the correlation networks too. In particular the financial sector is far more dominant than we would expect. The macroeconomic conditions are also visible in these graphs, with the financial crisis again forcing the mean centralities towards a mean.

The movement of centrality measures towards a mean during times of disruption is particularly interesting. Preis et al. (2012) pointed out that the market tends to be more correlation during times of market disruption, which makes selecting truly diversified portfolios very challenging as suddenly supposedly unrelated assets become related during these periods. This may be relevant here too - here we have that during periods of disruption companies start having far more similar behaviour than they did during times of stability.

We have pointed out this connection between the minimum risk portfolios and the degree centrality in the partial correlation networks, but to what degree does it hold? To further explore this we plot the L2 difference between the optimal portfolio vector and the sum of the diagonal of the precision matrix (as this is effectively discarded by the partial correlation matrix) in Fig. 10. From this we can see the difference can be quite large, although most of the difference seems to be explained by the size of the precision matrix diagonal, which is effectively discarded by the partial correlation networks.

Fig. 10
figure 10

L2 difference between the weight placed on each node in the optimal portfolio vs degree centrality of each node (left) and the sum of the precision matrix diagonal (right). The differences between the optimal portfolio and the degree centrality can be quite large, but most of the difference can be explained by the sum of the precision matrix diagonal. a Optimal Portfolio vs Degree Centrality b Sum of precision matrix diagonal

Out of sample portfolio performance

Previous work with correlation networks has stated that companies who are on the fringes of the network have a better Sharpe ratio than those who are more central (Peralta and Zareei 2016; Pozzi et al. 2013). We are curious as to how this applies in these networks. Therefore we study the centrality of a company against its out of sample Sharpe ratio (defined as the mean return over the standard deviation of the returns \(\frac {\mu }{\sigma }\)) and its risk (defined as the standard deviation of the returns) for the next window.

Using Spearman correlation we find that there is mild positive correlation between the centrality of a company and its out of sample Sharpe ratio in every network, and perhaps unsurprisingly mild negative correlation between out of sample risk and centrality. The exact results are shown in Table 1. All results are statistically significant at p<0.05.

Table 1 Spearman correlation between the centrality measures and the out of sample risk and Sharpe ratio

Community detection

In Figs. 1 and 2 it can be seen that there is some community structure in the networks, so we are interested in further studying this. To do so we use a community detection algorithm to divide each network into communities, and analyze how these change over time. Since we have a ground truth classification of the sector memberships of the various companies, we can also quantify how well these communities reflect the sector structure.

A popular method to detect communities is to attempt to maximize the modularity of the network. This is a hard problem (Brandes et al. 2006) and so various approximate methods have been proposed, including a spectral method (Newman 2006) or the Louvain algorithm (Blondel et al. 2008). These methods have been applied previously to detect communities in financial networks constructed from stock data (PICCARDI et al. 2011; Isogai 2014).

The classic formulation of modularity for a network with adjacency matrix A and a vector of community assignments \(\vec {c}\) is (Newman 2006)

$$ Q = \frac{1}{m} \sum_{i} \sum_{j} \left(A_{ij} - \frac{k_{i} k_{j}}{m}\right) \delta(c_{i}, c_{j}) $$
(14)

where \(m = \sum _{i}\sum _{j} A_{ij}, \delta (c_{i}, c_{j})\) is the Dirac function, equaling 1 when ci=cj (i.e. nodes i and j are in the same community and 0 otherwise) and ki is the sum of weights of a node. However this is not appropriate when we are looking at graphs with negative edges. Here we use a definition designed for the presence of negative edges. Proposed by Gomez et al. (2009) we divide the network into positive edges (signified by a +) and negative edges (signified by a −)

$$ A_{ij} = A^{+}_{ij} - A^{-}_{ij} $$
(15)

where

$$ A^{+}_{ij} = \max(0, A_{ij}) $$
(16)
$$ A^{-}_{ij} = \max(0, -A_{ij}) $$
(17)

and so the definitions of modularity are

$$ Q^{+} = \frac{1}{m^{+}} \sum_{i} \sum_{j} \left(A^{+}_{ij} - \frac{k^{+}_{i} k^{+}_{j}}{m^{+}}\right) \delta(c_{i}, c_{j}) $$
(18)
$$ Q^{-} = \frac{1}{m^{-}} \sum_{i} \sum_{j} \left(A^{-}_{ij} - \frac{k^{-}_{i} k^{-}_{j}}{m^{-}}\right) \delta(c_{i}, c_{j}) $$
(19)

where

$$ m^{+} = \sum_{i} \sum_{j} A^{+}_{ij} $$
(20)
$$ m^{-} = \sum_{i} \sum_{j} A^{-}_{ij} $$
(21)
$$ k^{+}_{i} = \sum_{j} A^{+}_{ij} $$
(22)
$$ k^{-}_{i} = \sum_{j} A^{-}_{ij} $$
(23)

Total modularity is then a scaled version of these

$$ Q = \frac{m^{+}}{m^{+} + m^{-}} Q^{+} - \frac{m^{-}}{m^{+} + m^{-}} Q^{-} $$
(24)

We choose the Louvain method (Blondel et al. 2008) to maximize modularity. This method works by maximizing modularity in a greedy bottom up manner. All nodes are initialized into their own random community. We then check the gain in modularity from moving node i from community a to community b. Doing so for all communities, we put node i in the community that maximizes the modularity gain the most, assuming the gain is positive. We then continue to do this over all nodes until we cannot put a node in another community and make a positive gain in modularity. Phase 1 of the algorithm is now complete. In phase 2 each community can then be treated as a node and the edges out to other communities collapsed into one edge per community. The algorithm can then be run again until we do not make a gain in modularity by collapsing the communities.

The gain in modularity from moving isolated node i into a community can be calculated by separately considering the positive and negative edges as follows

$$ \delta Q^{+} = \frac{\sum_{\text{in}}^{+} + 2 k_{i}^{+}}{m^{+}} - \left(\frac{\sum_{\text{tot}}^{+} + k_{i}^{+}}{m^{+}}\right)^{2} - \left(\frac{\sum_{\text{in}}^{+}}{m^{+}} - \left(\frac{\sum_{\text{tot}}^{+}}{ m^{+}}\right)^{2} - \left(\frac{k_{i}^{+}}{m^{+}}\right)^{2}\right) $$
(25)
$$ \delta Q^{-} = \frac{\sum_{\text{in}}^{-} + 2 k_{i}^{-}}{m^{-}} - \left(\frac{\sum_{\text{tot}}^{-} + k_{i}^{-}}{m^{-}}\right)^{2} - \left(\frac{\sum_{\text{in}}^{-}}{m^{-}} - \left(\frac{\sum_{\text{tot}}^{-}}{ m^{-}}\right)^{2} - \left(\frac{k_{i}^{-}}{m^{-}}\right)^{2}\right) $$
(26)

where \(\sum _{\text {in}}\) is the sum of weights of all the edges inside the community node i is being moved into, \(\sum _{\text {tot}}\) is the sum of weights of the edges to the community. The gains are then scaled by the total weight of positive and negative edges in the graph and combined together

$$ \delta Q = \frac{m^{-}}{m^{+} + m^{-}} \delta Q^{+} - \frac{m^{-}}{m^{+} + m^{-}} \delta Q^{-} $$
(27)

A notable advantage of this algorithm is that we do not need to choose the number of communities. Since this algorithm is greedy, we randomise the order of the nodes each run through in phase 1. This of course means that we will achieve different results every time we run the algorithm. Therefore we run the algorithm 10 times on each network to get a mean and standard deviation for any measures taken.

To evaluate our clustering we use the Adjusted Rand Index. Given a set of elements S, and two partitions of this X and Y divided into subsets the Rand Index (Rand 1971) is defined as

$$ R = \frac{a + b}{a + b + c + d} $$
(28)

where a is the number of pairs of items in the same subset in X and Y, b is the number of pairs of items that are in different subsets in X and Y, c is the number of pairs of items in the same subset in X and a different subset in Y and d in the number of pairs of items in the same subset in Y and different subsets in X. Basically it is very similar to a measure of accuracy if we have a ground truth labelling.

In the Adjusted Rand Index (ARI) a correction is made for chance, using the expected similarity for clustering under a random model:

$$ ARI = \frac{R - E[R]}{\max(R) - E[R]} $$
(29)

Figure 11 shows the ARI over time, exhibiting how the networks reflect the known sector structure. The partial correlation networks have less success overall in discovering the sector structure, although both networks show large variations over the time period. We suggest this is to due to the reduction in indirect correlations that the partial correlation coefficient provides - we would expect this to reduce intra-sector correlation strengths which would lead to a reduced success in recovering the sector structure. It is also noticeable that times of disruption seem to lower the ARI for the correlation networks. This could be due to the increased amount of correlation and volatility causing companies to behave more similarly, reducing the ability of the algorithm to separate them (Preis et al. 2012). However for the partial correlation networks the ARI actually increases during these times.

Fig. 11
figure 11

Adjusted Rand Index score over time using Louvain community detection algorithm. In general the partial correlation networks have less success in uncovering the sector structure than the correlation networks although there are time periods where this is not the case. This could be due to the reduction in supposedly indirect correlations by the partial correlation networks

We next look at the number of clusters produced. This is shown in Fig. 12. The correlation networks have a smaller number of clusters than the partial correlation networks, averaging around 4 while the partial correlation networks have an average of around 20. Both choose roughly the same number of clusters throughout the entire dataset, although we can see there is a small dip for both networks in 2009. Again this could be due to the increasing correlations during market disruption, making the companies seem more similar. There are 11 actual sectors in the dataset, so neither method is particularly close to the true value.

Fig. 12
figure 12

Number of clusters over time using Louvain community detection algorithm. The correlation networks generally seem to pick up a smaller number of clusters than the partial correlation networks, but both keep a similar number over the entire dataset. There is also a drop in the number of clusters detected for both networks during the financial crisis

Finally we study how stable the clusters are over time. Since we have 10 partitions per network, we use the adjusted rand index to compare the consistency of the clustering between those from the previous window and those from the next. The results of this are shown in Fig. 13. The correlation networks have a much more stable structure than the partial correlation networks (i.e the ARI between each run is larger meaning more companies are in the same cluster), although the variance does seem to be higher. For the correlation networks there is a large ’break’ in 2008 with the consistency dropping considerably, but then both networks have an increase in stable clustering over the crisis. This is consistent with our finding about network stability in “Network analysis and sector centrality” section, with the partial correlation networks being less stable than the correlation networks.

Fig. 13
figure 13

Clustering Consistency for the Louvain algorithm over time. The correlation network produces much more stable clusters than the partial correlation networks, although there is much more variance in this consistency than in the partial correlation networks

Conclusion and future work

In this paper we have constructed correlation and partial correlation networks from S&P500 returns data using the Ledoit-Wolf covariance estimator. This is designed to cope having more dimensions than samples and always gives us an invertible covariance matrix, which is required for estimation of the partial correlation matrix. We construct 140 networks using windows of data and contrast the correlation and partial correlation networks produced.

Firstly we compare the edge weights in these networks. The partial correlation network has more negative edges than the correlation network and generally has smaller weights. There is however a clear relationship between the two - several edges with a high correlation also have a high partial correlation. Since partial correlation is designed to reduce the effect of indirect correlations this is something we would expect.

Secondly we use the largest eigenvalue and corresponding eigenvector to measure the intensity and stability of the networks. The largest eigenvalue of the correlation network varies significantly depending on the state of the market at that time while the largest eigenvalue of the partial correlation network remains roughly constant. This shows that the partial correlation networks do not have much change in intensity over the dataset. However, using the difference in the largest eigenvector to measure the stability of the network we find the partial correlation networks are significantly less stable than the correlation networks, perhaps showing why minimum risk portfolios tend to be less stable.

Exploring the mean centrality of various sectors using both degree and eigenvector centrality, we find that in the partial correlation networks the sectors all have relatively similar mean centrality with degree centrality. Furthermore macroeconomic factors do not seem to effect the centrality, with all sectors have a fairly consistent mean degree centrality. This is not the case in the correlation networks, where there is clear variation in the centralities, notably during the financial crisis of 2008/2009. The eigenvector centralities show a very different story, with there being more variation in the partial correlation networks rather than less. Macroeconomic effects are also picked up in both networks here, again with the centralities of the sectors moving together during the financial crisis.

Utilizing these networks for portfolio selection, we find there is positive correlation between the centrality of a company and its out of sample Sharpe ratio but there is negative correlation between its centrality and risk. This result is statistically significant and is relevant for both the correlation and partial correlation networks.

Finally we run an altered Louvain community detection algorithm using a version of modularity that is designed for networks with negative edges to attempt to discover whether the sector assignments are replicated in the actual data. We find that the partial correlation networks are less successful than the correlation networks in uncovering these sector clusters. The correlation networks also produce more stable clustering with a lower number of clusters than the partial correlation networks. This indicates that in general the partial correlation networks have less stable structure than a correlation network constructed from the same data.