Introduction

Fund managers compete for investor’s money by signaling their ability to generate risk-adjusted returns (or alpha) to the market. Using Microsoft’s TrueSkill to estimate each manager’s skill, we study the impact on the portfolio’s risk level. We find highly skilled managers to take systematically more risk within one year’s tournament compared to less skilled managers. These results are robust regarding different market phases, different years with pronounced risk-shifting incentives, and different empirical approaches.

Our work contributes to the existing literature by introducing Microsoft’s TrueSkill algorithm as a new measure and thus regarding the tournament nature of the fund managers as a “game”. Building upon Bayesian network theory, TrueSkill identifies and tracks the skills of managers in a competitive setting in which the belief about a manager’s skill is estimated on the basis of a manager’s past performance relative to all other active managers. Despite broad evidence for the long-term underperformance of active managers against a benchmark (Fama 1965), individual managers seem to outperform the market in the short-term, resulting in higher fund inflows and compensation (Sirri and Tufano 1998; Kempf and Ruenzi 2008b), hence promoting a competitive environment among fund managers.

Second, we extend the empirical work in the area of fund tournaments, which was first introduced by Brown et al. (1996). They analyze the behavior of mutual fund managers within one year and detect a risk-seeking investment style for mid-term losers. Replicating their findings, our results indicate winners increasing their risk suggesting a different trend of individual behavior in tournaments in recent years. We then follow Kempf et al. (2009) and highlight risk-shifting differences in years driven by incentives (winners are rewarded for their outperformance) and years driven by unemployment risk (losers are facing high chances of having their funds closed due to underperformance). We extend this area of research by detecting certain investment patterns based on the individual skill level of the managers and highlight the correlation between skill and risk-seeking.

The remainder is structured as follows: In "Fund tournaments and skill" section, we introduce the fund tournaments’ setup and Microsoft’s TrueSkill, "Empirical results" section contains the empirical analysis, while the final section concludes.

Fund tournaments and skill

The economics of tournaments

Research in the fields of managerial tournaments is considered as a subset of the agency theoretic contracting theory, which deals with the disparity between principals’ and agents’ interests and risk aversions. Bolton and Dewatripont (2005) summarize the basic assumptions and implications for multiple scenarios in different areas of economics.

The underlying premise of our analysis is to view the market for portfolio management service as a multi-period decision making. This implies that investors decide in a cyclical pattern which fund service to invest in. One significant aspect of this investment process is the established compensation structure within the fund industry. Fund managers are often compensated based on their funds’ assets under management which implies large incentives to generate high fund inflows. Empirical evidence for the positive correlation in a multi-period context of the past performance of the individual fund and new fund inflows has been provided for example by Sirri and Tufano (1998).

This correlation leads to the plain risk adjustment hypothesis in literature of losing managers increasing their risk at mid-term in order to catch up on the leading managers within their peer [cf. Brown et al. (1996)]:

$$\begin{aligned} \frac{\sigma _{2L}}{\sigma _{1L}} > \frac{\sigma _{2W}}{\sigma _{1W}} \end{aligned}$$
(1)

where \(\sigma _{pL}\) indicates the risk level of a loser’s portfolio in period \(p \in \{1,2\}\) of a two-period annual tournament and \(\sigma _{pW}\) the risk level of a winner’s portfolio, respectively. Multiple researchers followed this hypothesis and analyzed various aspects and implications such as different time periods, competition within fund families, the impact of the selected fund segment, among others. Important ideas and results can be found in the works of Chevalier and Ellison (1997), Busse (2001), Deli (2002), Kempf and Ruenzi (2008a), Kempf and Ruenzi (2008b), and  Bär et al. (2010). Despite the findings of all these researchers, there are still contrary opinions about the existence of such tournament behavior between managers and especially the exact behavioral aspects for winners and losers, respectively.

The impact of prior performance through TrueSkill

New fund inflows are positively correlated with the standings of the individual fund at the end of the tournament, i.e., the end of the year. Most investors tend to trust in the past performance of a fund and expect it to result in positive returns at or around the benchmark level once the fund claims a top-level within a certain year. Hence, investors update their beliefs about the strength of an individual manager based on past, observed returns, and prior beliefs. In empirical research, this behavior has been modeled for example by Berk and Green (2004), who use a model that includes two key aspects: First, the performance of fund managers is not persistent and, second, investors behave as Bayesians. The first aspect can be interpreted as fund managers are not outperforming a passive benchmark continuously. Second, investors update their belief about the strength of an individual manager based on past, observed returns, and prior beliefs. This leads to the concept of conditional probabilities also known as Bayesian probability where the probability is interpreted as some reasonable expectation based on prior beliefs and knowledge.

The TrueSkill algorithm has been developed by a team from Microsoft Research in 2005 and is used for match-making in various online games ever since. The purpose of this ranking system is to detect and track the skill of individual players despite playing in teams, derive public rankings, and implement a match-making system that allows players of the same skill to play against each other. The general idea behind TrueSkill is to update the presumption about a player’s skill based on the observed outcome of a given game. This technique is called Bayesian inference as explained for example by Box and Tiao (2011). TrueSkill characterizes the belief of a manager’s skill as Gaussian uniquely described by its mean \(\mu \) and standard deviation \(\sigma \) [cf. Microsoft Research (2005)]. The parameter \(\mu \) can be interpreted as the average manager’s skill belief while \(\sigma \) describes the uncertainty about that skill level. The more games a participant plays, the smaller becomes his \(\sigma \) and therefore, the knowledge about a player’s skill becomes more precise. Furthermore, his average skill level \(\mu \) is updated based on the match outcome.

One of the most important advantages of TrueSkill is its adaptivity to any underlying setup of ranking match outcomes. It only needs a clear ranking for each match—whether teams are compared with each other or individuals. We will give a brief overview of the underlying process of TrueSkill in order to derive a basic understanding of its functionality. However, we will not explain every mathematical step and its technical realization within the algorithm but refer to the paper of Herbrich et al. (2006).

Let k managers with a total of n funds \(\{1, \ldots , n\}\) compete in a match. Each fund is uniquely assigned to a single manager resulting in k disjunct subsets \(A_{j} \subset \{1, \ldots , n\}\). For each match, the outcome \(\mathbf{r } := (r_1, \ldots , r_k) \in \{1, \ldots ,k\}\) indicates the match specific ranks \(r_j\) for each manager j in an ascending order; i.e., \(r_j = 1\) is the winner and possible draws are given as \(r_i = r_j\). Making use of Bayes’ rule, the conditional probability \(P(\mathbf{r } | \mathbf{s }, A)\) of the game outcome \(\mathbf{r }\) given the individual skills \(\mathbf{s } := (s_1, \ldots , s_n)\) of all participating funds in their manager assignments \(A := (A_1, \ldots , A_k)\) leads to the posterior distribution of

$$\begin{aligned} p(\mathbf{s } | \mathbf{r },A) = \frac{P(\mathbf{r } |\mathbf{s },A)p(\mathbf{s })}{P(\mathbf{r } |A)}. \end{aligned}$$
(2)

The prior distribution of the funds’ skill \(f(\mathbf{s }) := \prod _{i=1}^n {\mathcal {N}}(\mu _{i},\,\sigma _{i}^{2})\) is assumed to be a factorizing Gaussian, while each fund i has a performance \(f_{i} \sim {\mathcal {N}}(s_{i},\beta _{i}^{2})\) in the match, centered around their individual skill \(s_{i}\) with fixed variance \(\beta _{i}^{2}\). With TrueSkill, the performance \(m_j\) of a manager j is defined as the sum of its individual funds’ performances indicated by \(m_j := \sum _{i \in A_j} f_i\) [cf. Herbrich et al. (2006)]. Figure 1 shows the exemplary process of TrueSkill as a factor graph. This methodology is used in information technologies to describe complicated ’global’ functions consisting of many variables which are most likely derived themselves from various ’local’ functions. Those global functions factor as a product of the local functions and can therefore be described in a bipartite graph called factor graph. Further information can be found in Kschischang et al. (2001).

Fig. 1
figure 1

Schematic work of TrueSkill as a factor graph. Notes: schematic work of TrueSkill illustrated as a factor graph [based on Herbrich et al. (2006)] for the resulting joint distribution \(p(\mathbf{s }, \mathbf{f }, \mathbf , | \mathbf{r }, A)\) of three managers with a total of four funds and manager 1 winning, while manager 2 and manager 3 draw (\(k=3\), \(A_1=\{1\}\), \(A_2=\{2, 3\}\), \(A_3=\{4\}\) and the ranking \(\mathbf{r }:=(1,2,2)\)). The black boxes represent the factor functions which are used to calculate the local variables—visualized by the light gray circles. The gray arrows indicate the initial calculation of the skill level for all three managers followed by the ’inner iteration circle.’ This circle is used to approximate the new skill level of all managers, while after that the black arrows indicate the updates of the skill beliefs for each individual fund

Identifying skill based tendencies in risk-shifting

In a first step, we calculate the six-month rolling information ratio as a performance measure of each fund. We use these ratios to create a rating of the funds on a monthly base to feed-forward to the TrueSkill algorithm. At this stage, funds with less than one year of tracking record prior to the start of the tournament year are also included due to the initial calculation of skill levels. Second, the funds included in the annual tournaments compete against each other on a monthly base whereas their skill level—and therefore the skill level of each manager—is calculated by TrueSkill based on the performance rankings. To compare the skill level of different fund managers, we use only each manager’s expected average skill level \(\mu \) once the skill development is calculated. To overcome biases for new managers who have not reached their intentional skill level yet, we only consider managers and therefore funds with at least one year of tracking record. This leads to at least 18 matches between all managers and their funds before they are categorized at the end of a tournament’s interim period for the first time.

To analyze the skill-dependent risk-shifting, we use conditional transition matrices for the best 20% (high skill), the next 60% (medium skill), and the least 20% (low skill) of each year’s managers. We follow the work of Ammann and Verhofen (2009) and adapt this transition approach, commonly known from credit default analyses. The transitions are based on the historical volatility of each manager’s portfolio, whereas each manager is assigned to a risk tercile:

$$\begin{aligned} (e_{i1}, e_{i2}) \in \{1, \ldots , 3\}^2, \qquad i = 1, \ldots , 3 \end{aligned}$$
(3)

with \(e_{i1}\) characterizing the risk tercile of manager i in the interim period and \(e_{i2}\) the risk tercile in the second half of the year’s tournament. Here, 1 indicates the highest risk tercile and 3 the lowest, respectively. These migration events of the same kind are now aggregated in a 3 × 3 matrix of migration frequencies where the generic element

$$\begin{aligned} c_{jk} = \sum _{i=1}^3 {\mathbf {1}}\{(e_{i1}, e_{i2}) = (j,k)\} \end{aligned}$$
(4)

is the number of migration events from j to k and \({\mathbf {1}}\{\dots \}\) the indicator function. Furthermore, we assume that the observations \(e_{i2}\) are the realization of the random variables \({\widetilde{e}}_{i2}\) with conditional probability distribution

$$\begin{aligned} p_{jk} = {\mathbf {P}}\left( {\widetilde{e}}_{i2} = k \;|\; {\widetilde{e}}_{i1} = j \right) , \qquad \sum _{k=1}^{3} p_{jk} = 1 \end{aligned}$$
(5)

with the probability \(p_{jk}\) of the risk level of a manager’s portfolio to change from the jth to the kth tercile. Therefore, we use the migration rates as observed:

$$\begin{aligned} {\hat{p}}_{jk} = \frac{c_{jk}}{n_j} \end{aligned}$$
(6)

with \(n_j = \sum _{k=1}^3 c_{jk}\). To identify any differences between the differently skilled managers, we use a chi-squared test to check for pairwise homogeneity of the transition matrices. The test statistic

$$\begin{aligned} \chi ^2 = \sum _{k=1}^3 \sum _{s=1}^2 \frac{\left( c_{jk}(s) - n_{j}(s){\hat{p}}_{jk}^{+} \right) ^2}{n_{j}(s){\hat{p}}_{jk}^{+}} \end{aligned}$$
(7)

is asymptotically \(\chi ^2\)-distributed with two degrees of freedom. The variable \({\hat{p}}_{jk}^{+}\) models the estimated probability based on the aggregated data of the two transition matrices and s is the index for the respective sample, e.g., high- and medium-skilled manager.

By the nature of this approach, our analyses put emphasis on the whole dynamics of the risk-shifting tendencies of differently skilled fund managers. Transition matrices as employed in this study are, inter-alia, widely used in the literature on credit risks [cf. Höse et al. (2009) for details] and in previous studies focusing on prior performance and risk-taking of mutual fund managers (Ammann and Verhofen 2007, 2009).

Empirical results

Our empirical analysis builds on the two databases Morningstar and Bloomberg. Following Brown et al. (1996) and further researchers in their choice of taking growth-oriented US equity funds due to the high interest of financial press and direct investor involvement, we include all funds classified by the Morningstar categories US OE Large Growth, US OE Small Growth and US OE Mid-Cap Growth. We use monthly closing prices by Bloomberg of the categorized funds for the period of 1991 to 2017. This long period allows analyzing the behavior of the managers in various market situations since the selected period combines multiple different aspects such as financial crisis and market phases with a positive long-term trend, e.g., 2009–2017. All funds are listed in US dollar and we clean them for survivorship bias.

Fig. 2
figure 2

Managers per Tournament and Benchmark-Related Performance. Notes: black (white) bars indicate a positive (negative) average active return in the respective year

Furthermore, we tackle the fact that various funds are team-managed and multiple managers handle more than one fund by using a string matching algorithm to identify funds managed by the same managers. We exclude all team-managed funds and match the remaining funds clearly to a single manager. This results in 559 individual managers who hold at least one fund on their own within the given time period.

We include all funds in each year’s tournament which have at least one year of tracking record and do not miss any data point in the given period. Also, we use two periods of six months to analyze the risk-shifting, which leads to June being the end of the interim period. Those managers above the average at that point are classified as winners and those below as losers. Managers with two or more funds fulfilling these requirements are considered to hold an equally weighted portfolio of their funds to reduce the impact of pro-active risk-shifting across multiple funds. To calculate benchmark-related performance measures, we use the data of the MSCI North America for the same period. An overview of the annual tournaments and the average performance of its participating manager against the benchmark is given in Fig. 2.

There are several options to measure risk-levels of mutual funds. Examples are the return standard deviation, the tracking-error standard deviation which is the standard deviation of the excess returns of the fund over a benchmark, or the systematic risk a fund takes which is commonly estimated via a market model. However, the latter two are rather uncommon in mutual fund tournament studies. We follow previous studies and measure risk by the annualized standard deviation of the monthly fund returns (Brown et al. 1996; Kempf and Ruenzi 2008b).

Measuring performance with TrueSkill

We start our empirical analysis by demonstrating TrueSkill’s capability to take prior performance into account. Figure 3 shows the development of the Pearson correlation coefficients between the TrueSkill based rankings of all participating funds within the tournament of five, two, and one years and their information ratio rankings. The left panel shows the correlation with TrueSkill levels being calculated for 4 years prior to 2015, the middle one with 1 year prior, and the right one with TrueSkill establishment just starting in 2015. Hence, Fig. 3 underlines the time dependence of TrueSkill and its adaptation of prior performance while establishing skill levels. Since investors’ decisions are often based on behavioral aspects such as prior performance or performance of fund family members [e.g., Sirri and Tufano (1998), Nanda et al. (2004)], TrueSkill is an adequate skill measure due to its capability of incorporating these aspects.

Fig. 3
figure 3

Correlation of TrueSkill and information ratio. Notes: this figure shows the evolution of the Pearson rank correlations between TrueSkill and its underlying performance measure over different time spans for an exemplary year (2015). More precisely, the left (middle, right) figure shows the rank correlation between the TrueSkill rankings estimated over the trailing five (two, one) years and the rankings based solely on the information ratio over the trailing six months to the corresponding month in 2015

Skill driven risk-shifting

Table 1 shows the aggregated risk-shifting tendencies for the whole sample period. It is structured into four panels—the first one is showing the unconditional transition rates based on the risk terciles in the first and second half of the year and the other three panels are showing the transition rates for the different skill levels. Thus, the three skill-based transition matrices are subsamples of the unconditional case. The \(\chi ^2\)-values are representing the H0-hypotheses of conditional transitions being equal to the unconditional. Panel D shows significant differences to the unconditional case at the 5% and 1% level for winners and losers, respectively. Indeed, the tendencies in increasing the risk levels are much lower for managers with less skill than for those with high skill.

Table 1 Risk transitions aggregated 1992–2017

The first observable pattern is the difference in general risk-seeking between winners and losers in general. While winners tend to stay at their initial level or even increase the risk in the second period of the year, losers act the other way around. With transition rates of 30.0, 11.1, and 32.9 for winner compared to rates of 26.6, 11.2, and 28.1 for losers, Panel A demonstrates the risk-seeking behavior of winners. Vice versa rates of risk decreasing by 25.7, 11.8, and 25.2 compared to 25.0, 13.7, and 32.5 complement this pattern. The subsamples given by Panel B to D indicate similar patterns across the different skill levels. Still, a lot of managers stay within their first half risk tercile with transitions up to 62.4. The transitions for remaining managers are highest for the extreme risk terciles of high and low risk.

Looking at the impact of different skill levels for either tournament standing, we find clear tendencies of high-skilled managers increasing their risk more often than those with less skill regardless of their first-half performance. The comparison of Panel B and Panel D shows higher risk increasing rates for skilled managers in both positions. Hence, the risk decreasing rates are always higher for managers with less skill. Additionally, the risk remaining transitions are bigger for high-skilled managers in the highest risk tercile and low-skilled managers in the lowest risk tercile, respectively.

The subsample for high- and medium-skilled managers are also closely related to the unconditional one. The \(\chi ^2\)-test values show no significant differences here. In contrast, the subsample of low-skilled managers differs from the unconditional sample at the 5% level for winners and even at the 1% level for losers. This indicates more controversial behavior for the minority of less-skilled managers, who seem to secure their wins if possible and cut their losses during bad tournaments.

In the next step, we take a closer look at years of extreme risk-shifting. Therefore, we aggregate the five years with the highest risk decreasing by losers and those with the highest risk increasing by losers while winners acting vice versa. These periods are classified as years dominated by unemployment risk and years dominated by compensation incentives. Hence, we follow Kempf et al. (2009) in their explanation for different risk-shifting tendencies in special periods. The five compensation incentive dominated years are 1992, 1995, 2006, 2014, and 2017, identified in Table 4 in Appendix 1 as those years where the highest RARs are given for mid-year losers. The years dominated by the risk of unemployment are 1993, 2000, 2001, 2004, and 2016; these are the years where losers have extremely low risk adjustment ratios at mid-term.

Table 2 Risk transitions based on extreme risk-shifting

Table 2 highlights the differences between years dominated by compensation incentives (Panel A) and years dominated by unemployment risk (Panel B). Overall, both panels show similar patterns of winners increasing the risk level in the second half as well. While Panel B has no significant difference between skilled and unskilled winners, Panel A emphasizes the overconfidence of managers with higher skill levels, who are seeking more risk past the interim period. The difference between skilled and unskilled managers is significant at the 10% level.

More importantly, the skill level of individual managers affects their decision making in a losing scenario. Skilled managers seem to rely on their prior performance and increase their risk level dramatically with transitions of 47.8, 8.7, and 34.8 in years dominated by compensation incentives. Instead of cutting their losses, they attempt to catch up by investing very self-confident. Less skilled managers behave differently and cut their losses. They have decreasing transitions of 41.2, 11.8, and 35.7. Here, the skill level of each manager seems to determine his behavior dramatically. The difference between these types of skills is significant at the 1% level. In years dominated by unemployment, the majority of all managers decrease their risk level in the second half of the year, if they lose. Only very few skilled managers try to increase their risk level at this stage and instead, a few managers with less skill start to gamble for a win. Those few seem to go all in before their funds are closed permanently. The difference between skilled and unskilled managers is again significant at the 1% level.  Ammann and Verhofen (2007) introduce another dynamic Bayesian network approach to analyze the impact of prior performance. They find similar results given as risk-increasing behavior after years of good performance and decreasing risk-taking after bad years. Within their following work, Ammann and Verhofen (2009) even highlight the same patterns of winning managers increasing their risk in the following period and loser acting vice versa. Our findings are in line with their results and even underlining the impact of prior performance—measured as investor’s belief about the individual manager’s skill.

Robustness tests

Underlying performance measure

The most important parameter within the TrueSkill setup seems to be the choice of the underlying performance measure to calculate monthly rankings, which are the start of further skill calculations. We test for the impact of different performance measures by repeating our analysis with monthly active returns of all participating managers. Table 5 in Appendix 2 shows very similar results to our previous analysis indicating high-skilled managers to increase their risk most of the time regardless of their performance in the first half of the year. These results underline the high correlation between different performance measures as shown by Eling and Schuhmacher (2007). We calculate the Spearman rank-order correlation coefficients inclusive a two-sided p value for a hypothesis test with the null hypothesis of non-correlation between the data series for three different measures. Table 6 in Appendix 2 outlines the strong and significant correlation between the Sharpe ratio, Information ratio and active return rankings. We conclude that the choice of the underlying performance measure does not affect our initial results significantly.

Skill thresholds

The results could be driven by the choice of quantiles that classify managers into their skill level. In the main specification, we classified the top 20% as highly skilled and the bottom 20% as low-skilled which leaves a 20–60–20 split. Other reasonable splits, e.g., 10–80–10, lead to the same conclusions as we show in untabulated results.

Risk adjustment ratio approach

Our next robustness test deals with the general tournament behavior regardless of the individual skill of each manager. Therefore, we replicate the contingency table approach introduced by Brown et al. (1996) based on the risk adjustment ratios. The results presented in Table 4 in Appendix 1 are in line with our results of skill-driven investments, indicating a different trend of individual behavior in tournaments in recent years. Winners have higher RARs in most of the years, which is in contrast to earlier findings of Brown et al. (1996). Still, this demonstrates that our findings are in line with previous methodologies.

Hyperparameter of the prior distribution

In our empirical analysis, we set the initial prior distribution of the fund managers’ skills as described in "The impact of prior performance through TrueSkill" section as \(f(\mathbf{s }) := \prod _{i=1}^n {\mathcal {N}}(\mu _{i},\,\sigma _{i}^{2})\) with \(\mu _i = 25\) and \(\sigma _i = \frac{\mu _i}{3} \approx 8.33\). Please note that the average skill level \(\mu _i\) is not of much interest in absolute terms since all managers are assumed to start with the same initial skill. Since we do not define a unit to measure the skill other than using the Gaussian’s parameters \(\mu _i\) and \(\sigma _i\), the relative belief of two fund managers given by their skill distribution is of higher relevance. In that terms, it does not make much difference whether we start with a level of 10, 100, or the standard level of 25Footnote 1 as proposed by Herbrich et al. (2006), which originates from TrueSkill’s early comparability with the ELO ranking.

To underline the low impact of the initial priors on our results, we vary the relation between \(\mu _i\) and \(\sigma _i\), i.e., \(\sigma _i \in \{\frac{\mu _i}{2}, \frac{\mu _i}{4}\}\). The results are qualitatively similar to our base case \(\sigma _i = \frac{\mu _i}{3}\), see Tables 7 and 8 in Appendix 2.

The neglectable impact of the priors is in line with the theoretical expectation about their impact: With sample size \(n \longrightarrow \infty \) the difference between two posteriors based on different Gaussian priors tends toward zero. The same holds for larger prior variances \(\sigma _i\), as outlined for example by Ley et al. (2017).

Different benchmark indexes

Within our analysis, we use a risk-adjusted approach to determine the rankings of each manager used for the TrueSkill algorithm. In fact, our measure of choice is the information ratio as a market model adjustment measure where the benchmark is the MSCI North America. Given the different setup of mutual funds and their long-term purposes, e.g., equity-only, long-only, multi-asset, and so on, our chosen benchmark might not be appropriate for every mutual fund in the universe. Nevertheless, we restrict our fund sample to growth-oriented US equity mutual funds as earlier researchers before (Brown et al. 1996; Taylor 2003; Kempf and Ruenzi 2008b). The categorization is based on the widely accepted classification by Morningstar, which leads to a quite homogeneous sample. We qualify this putative sample restriction by similar arguments used in earlier research.

However, Morningstar specifies two benchmark indexes for each of its categories. The primary index for all three categories used in this study is the S&P500 which correlates almost perfectly with the MSCI North America. The secondary benchmark index differs for each category.Footnote 2 We repeat our analysis benchmarking each fund on its secondary benchmark index and report the results in Table 9. Overall, the conditional transition matrices differ stronger from the unconditional transition matrix than in our baseline case. In line with our previous findings, we find a tendency that winning managers increase their risk more than losers and that managers classified as low-skilled seem to adjust their risk less than managers classified as high-skilled.

Regression approach

On the basis of the conditional transition matrix approach, our results suggest that the risk-shifting tendencies are significantly different for low- and high-skilled fund managers and, beyond that that high-skilled managers tend to increase their risk-levels to a higher extent compared to low-skilled managers. We acknowledge that conclusions like these have to be interpreted with caution due to unobservable covariates that might influence the results. To mitigate the effect of omitted variables and provide further empirical evidence for our conclusions, we formulate the following regression model:

$$\begin{aligned} \Delta \sigma _{i,t} = \beta _1 \text {Rank}_{i,t} \times D^{\text {H}}_{i,t} + \beta _2 \text {Rank}_{i,t} \times D^{\text {L}}_{i,t} + \beta _3 \sigma ^{\text {First Half}}_{i,t} + \epsilon _{i,t} \end{aligned}$$
(8)

where the dependent variable, \(\Delta \sigma _{i,t}=\sigma ^{\text {Second Half}}_{i,t}-\sigma ^{\text {First Half}}_{i,t}\), is the change in standard deviations of fund i’s returns from the first to the second half of the year t. \(\hbox {Rank}_{i,t}\) denotes the rank of the fund manager with respect to all other managers scaled to the interval [0, 1] (1 being best). High respectively low manager skill is denoted by \(D^*_{i,t}\) with \(^* \in \{\text {H,L}\}\). In a further specification, we replace \(\hbox {Rank}_{i,t}\) with dummy variables indicating that a fund manager ranked in the top 20% respectively bottom 20% of all active managers analogous to the main analysis. For all specifications, we include time and fund-company fixed effects. The latter control, for example, for all time-invariant characteristics attributable to a manager’s company that may influence the results.

We present the results of four specifications in Table 3, two each using either the information ratio or active returns to estimate the managers’ skill levels via TrueSkill. All specifications indicate that high-skilled fund managers significantly increase their risk after performing well in the first half of the year. Contrary, we find the opposite signs for any coefficient associated with risk-shifting of less-skilled fund managers. Equality tests reject the null hypotheses \(D^{\text {Win}} \times D^{\text {H}} = D^{\text {Win}} \times D^{\text {L}}\) and \(D^{\text {Loss}} \times D^{\text {H}} = D^{\text {Loss}} \times D^{\text {L}}\). The explained variation in risk-shifting amounts to \(\approx 75\%\), which is a common value in fund tournament studies. Overall, the results support our conclusions drawn from the conditional transition matrix approach and provide further insights on the channels that foster the results.

Table 3 Regression models of different skill-levels on risk-shifting

dummy

Comparison to ELO

Last, we compare TrueSkill with another popular skill measure—the ELO rating, most known from the world of chess. The ELO ranking system is used in competitive chess as well as various unofficial rankings, e.g., online gaming or football tournaments. It is much simpler in its calculations and therefore not capable to adapt teams playing each other. Figure 4 shows the skill development of three random managers of the whole period sample measured by TrueSkill and ELO. Both ratings are based on monthly matches between all managers participating in a given year’s tournament. The skill levels are normalized to make them comparable since the absolute level differs between both systems. Due to our premise of being included in a year’s tournament if and only if there is more than one year of tracking record, the managers seem to start with different levels, but in fact, they started all with the same setup, initially. The ELO ratings vary rapidly on a high frequency, while the TrueSkill ratings are adjusting themselves much slower and only react to unexpected outcomes.

Fig. 4
figure 4

Different skill development of three random fund managers. Notes: this figure shows the temporal development of skill ratings based on TrueSkill and ELO for three randomly chosen fund managers from our sample. ELO is a method for calculating the relative skill levels, commonly used in chess. The bottom figure compares the rolling standard deviations of the two methods highlighting the stable skill belief estimated via TrueSkill

The third panel of Fig. 4 underlines the differences in volatility by representing the rolling standard deviation over 12 months of the normalized ELO and TrueSkill ratings, respectively. Hence, the average TrueSkill standard deviation is at 0.106 and therefore much lower than the one of ELO given as 1.907. A good skill measure should offer low volatility to establish a stable belief about the skill level of an individual manager in the long-term. The time stability of TrueSkill shows its potential to classify managers into skill levels and derive skill-based behavior from it.

Summarizing the results of the robustness checks, our results about the impact of skill are in line with theories in behavioral finance and psychology, showing the overconfidence of outperforming managers in their investment decisions. Taylor and Brown (1988) find evidence of people having unrealistically positive views of themselves which leads to the described self-confidence not only after being among the winners for a couple of competitions and De Bondt and Thaler (1995) detect a positive correlation between high confidence and above average-trading frequencies.

Conclusion

Our results highlight the self-confident behavior of skilled managers by holding or increasing their portfolio risk in almost every situation compared to those with less skill. Applying the TrueSkill algorithm to display investors’ beliefs about the individual skill level of fund managers, we present a way to model the positive correlation of prior performance and new investment decisions.

The impact of good performance in recent years seems to lead to an over-confident investment style of managers, who are shifting their portfolio risk toward the higher tercile of the peer group in the second half of the year. Only a few managers classified with less skill increase their risk in a losing situation. We demonstrate the robustness of our results regarding the choice of the performance measure to rank the managers each month as well as the usability of TrueSkill as an adequate representation of investor’s belief about a manager’s skill.