1 Introduction

Commitment to, implementation of, and compliance with obligations imposed by International Organizations (IOs) is a fundamental vein of inquiry in the study of international institutions as implementation of conditions may have material impacts on state socio-economic outcomes (Moll & Smets, 2020; Stubbs et al., 2020). One IO which receives an inordinate amount of scrutiny is the International Monetary Fund (IMF). A litany of papers has examined the extent to which countries adhere to the “conditionality” of IMF country assistance programs, i.e., policies, and budgetary, employment and fiscal targets prescribed by loan agreements (Allegret & Dulbecco, 2007; Dreher, 2009; Steinward & Stone, 2008; Vreeland, 2006). Explanations for why countries successfully implement and comply with IMF conditions include legislative structure or political stability (Ivanova et al., 2003), government ideological cohesion (Joyce, 2006), democracy (Dreher, 2006), international audience costs (Fang & Owen, 2011), or the structure of the conditions themselves (Reinsberg et al., 2021).

We add to this literature by examining the extent to which a country’s ethnic homogeneity contributes to its ability to successfully implement and comply with IMF conditionality. We argue that ethnically homogenous societies will be more likely to internalize collective responsibility for the macroeconomic conditions that precipitate an IMF program, more likely to undertake the collective obligation to burden-share the costs of IMF conditionality, and more likely to realize a collective benefit of successful reforms. Based on these logics, we then hypothesize that more fractionalized communities will have a more difficult time implementing and complying with IMF conditions. We further argue that these dynamics will be even more prevalent in democracies where ethnic divisions may become more apparent in the formal decision-making process.

Using data from the IMF’s Monitoring of Fund Arrangements (MONA) database, and artificial borders data developed by Alesina et al. (2011) as instrumental variables for ethnic fractionalization, we find support for our hypotheses. These are our primary results as an instrumental variable strategy addresses the potential endogeneity that stems from both omitted variable bias and the fact that countries may receive conditions that are more or less difficult to implement depending on their level of ethnic fractionalization. Substantively, we find that a standard deviation increase in ethnic fractionalization decreases the share of IMF conditions implemented by up to 1/3rd of a standard deviation of total IMF conditions implemented. However, when considering regime type, we find that regime type does not greatly change the impact of ethnic diversity on the implementation of IMF conditions. These results hold both for all conditions and for “hard” IMF conditions, i.e., those upon which funding is notionally contingent, and are robust to alternate conceptualizations of compliance, different measures of ethnic fractionalization, country fixed effects, and different instruments.

In the sections below, we first briefly review the literature on IMF conditionality implementation and compliance before developing our argument about why ethnically diverse countries will be less successful to those ends. When then describe our data and identification approach before presenting our findings. We conclude with thoughts on the broader implications for the IMF compliance literature, but also on the impact of ethnic fractionalization for compliance with international institutions more generally.

2 Ethnic diversity and IMF conditionality implementation and compliance

The literature on the implementation of, and compliance with, IMF conditions centers around domestic (IMF program country) political and institutional determinants and international level determinants. International-level components include the ability of the IMF to directly sanction for non-compliance via the withholding of funds or the imposition of audience or reputational costs (Fang and Owe(Fang and Owen 2011); Rickard & Caraway,2019) which may affect program countries’ ability to access international capital markets or attract or retain foreign direct investment (FDI) (Biglaiser & DeRouen, 2010; Edwards, 2005; Vadlamannati, 2020). However, empirical work has found that the IMF only selectively sanctions non-compliance or restricts future access to funds (Ivanova et al., 2003), perhaps due to the often-geo-strategic nature of considerations inherent in IMF lending decisions (Fang and Owen(Fang and Owen 2011)).

Domestically, scholars have focused on both the politics and institutions of program countries to understand the determinants of compliance. Popular explanations have focused on the strength and cohesion of government and the ability to curb special interests as vital to successfully implementing conditions (Ivanova et al., 2003; Joyce, 2006). Beazer and Woo (2016) find that implementation hinges on the ideology of the government in power. Perhaps somewhat counter-intuitively, they find that IMF programs are more likely to be implemented by left-wing governments as they face less resistance in implementing market reforms from their right-wing opposition. This finding squares with those of Nsouli et al. (2005) who find that political and social opposition are significant reasons for non-compliance. Implementation capacity also matters, and the odds of compliance are higher when the implementation costs are low (Vadlamannati et al., 2018).

We build on the work focusing on domestic factors by considering how the degree of societal homogeneity can help build political support that allows governments to implement the often difficult and costly conditionality associated with IMF programs. Our argument incorporates a sociotropic logic that has been applied in other settings, including the US (Kinder & Kiewiet, 1981), the former Soviet Union (Duch, 1993), and Mexico (Kaufman & Zuckermann, 1998), when trying to understand how governments are able to undertake costly reforms. The arguments in these works hold that an individual’s opinions about the necessity or desirability of economic reforms are shaped by if the reforms will be good for the society as a whole.Footnote 1 More directly, Edwards (2009) finds that sociotropic considerations influence support for international economic organizations (IEOs), including the IMF (and to a greater degree than individual considerations). Specifically, Edwards (2009) finds this support is pro-cyclical, such that sociotropic support for IEOs is strongest when economic conditions are favorable and weakest when they are poor. Importantly, this pro-cyclicality suggests that it is prospective sociotropic evaluations (of IEO programs) that may matter for political support for those programs.

We argue that governments in countries with a high degree of ethnic homogeneity will be better able to build political support based on a sociotropic rationale for costly reforms compared to a society that has higher ethnic fractionalization. We assume that this support will stem from three collective recognitions. First, in ethnically homogenous societies, it is more difficult to scapegoat the profligate debt or spending that may have led the country to seek IMF assistance on an “other.” As discussed by Glynos and Voutyras (2016, p. 211), this blaming of an “other” is a form of Nietzschean ressentiment wherein an individual “renders specific groups other than ourselves responsible for loss and its consequences.” Empirically, a raft of studies has found evidence of blame-shifting of economic problems onto “others”, where the “other” is based on ethnic outgroup (Bukowski et al., 2017; Butz & Yogeeswaran, 2011), political opposition group (Marsh & Tilley, 2010; Traber et al(Vadlamannati et al. 2018)), or class-cleavage (De Wilde et al.,2019). However, homogenous societies have fewer of both ethnic and political “others” as they tend to have less political party fragmentation (Moser et al(Moser 2011)). Homogenous groups are also more likely to have higher levels of income equality, diminishing the opportunity for “othering” based on class cleavages (Sturm & De Haan,2015). Accordingly, there are fewer salient targets to assign blame for poor macroeconomic management. As a result, individuals may be more likely to accept collective responsibility for the macro-economic state of the country. If the benefits from the earlier debt profligacy were widespread then individuals may be even more willing to ascribe societal ownership of the debt and understand that the burden of reforms from IMF conditionality are theirs to bear.Footnote 2

This collective sense of ownership of a country’s macroeconomic position may then lend political support to the collective obligation to meet the conditions required by IMF country programs. Indeed, official narratives of austerity involve politicians calling upon citizens to “stick together” in addressing the demands of budgetary adjustment (Titley, 2013).Footnote 3 It is easier to mobilize collective acceptance of, and resilience to, economic adjustments and reforms when citizens are able to internalize “the mantra that ‘we’re all in this together’ ” (Dagdeviren et al., 2016, p. 15). This mantra is likely to be an easier sell in ethnically homogenous societies. Accordingly, individuals in these states may be more amenable to the collective burden sharing that IMF conditionality imposes. Individuals may be less likely to believe that certain groups will be able to “escape” the pain of IMF conditionality simply because those other groups do not exist.

On the flip side, individuals in homogenous societies are more likely to feel there are collective benefits from successfully implementing IMF economic reforms. As these benefits are realized after the implementation of reforms, Edwards’s (2009) finding of pro-cyclicality in sociotropic support for IEOs means that this support must be based on a prospective view of future economic performance during reform implementation (Krause, 1997). We argue that adaptation of prospective sociotropic views is more likely in ethnically homogenous societies because, to the extent that successful implementation of IMF conditions spurs growth, the gains are more likely to be distributed in ethnically homogenous societies among co-ethnolinguists. In other words, your “group” will be likely to benefit rather than some outgroup, mainly because, again, there are simply fewer outgroups. While recent evidence finds that IMF programs increase income inequality (Forster et al., 2019; Lang, 2021), a near-canonical literature suggests that ethnic heterogeneity limits income redistribution as individuals from different ethnicities will view each other as competitors for state resources (Alesina & Glaeser, 2004; Dincer & Lambert, 2012; Sturm & De Haan, 2015; Morgan & Kelly, 2017). Individuals may be more willing to endure the “cost” of the IMF conditionality because there is less chance of redistribution of the “rewards” from the sacrifices to an “other”.

Based on these three “collective” logics, we believe there is strong reason to think that individuals in ethnically homogenous societies will be more likely to lend the political support necessary to see through the implementation of, and compliance with, IMF program conditions. Contrastingly, countries with high degrees of ethnic fractionalization will find it more difficult to build and sustain a political coalition for implementation. Accordingly, our theory rests on how ethnic fractionalization might impact the willingness of government to undertake reforms. Indeed, in Nigeria, a country with a high degree of ethnic fractionalization, an IMF “concluding statement” on the 2000 program noted how the missing of key targets may well have been a result of, euphemistically, “Nigerian realities.”Footnote 4 This willingness logic differs from that in recent work (Reinsberg et al., 2021) that suggests that the complicated and detailed nature of conditions might lead to a lack of implementation due to insufficient government capacity.Footnote 5 Accordingly, we hypothesize that:

  • H1: Increased ethnic heterogeneity weakens the implementation of IMF program conditions.

While we think our hypothesis will hold for all IMF conditions, we would expect to see this dynamic particularly with the more binding IMF conditions, i.e., those most in need of political support in that they need to be implemented to secure IMF credit (Copelovitch, 2010). These conditions often include fiscal adjustments that necessitate public sector reforms that can be particularly challenging for households (Kentikelenis et al., 2016), or those that call for specific legislative action from the partner country, and thus are most in need of political support (Konstantinidis & Reinsberg, 2020). As these conditions are both the most difficult to implement, but also tend impose the largest social costs on society, they will be most likely to be influenced by the collective logics discussed above.

In contrast, those conditions aimed at structural reforms, which are often difficult to quantify and tough to monitor (Goldstein, 2000), are referred to as structural benchmarks. These structural benchmarks are not always subject to quarterly evaluations (like the performance criteria) and therefore enforcement is lax (Dreher et al., 2015). While an unmet performance criteria condition requires a formal waiver from the Executive Board of the Fund, a structural benchmark condition does not need a formal waiver if unmet. In fact, as Goldstein suggests, “Failure to meet structural benchmarks conveys a negative signal but does not automatically render a country ineligible to draw, instead, a decision about eligibility would be judgmental” (Goldstein, 2000, p 32). Given the non-punitive nature of the structural benchmarks, we categorize them as soft conditions. Without effective sanctions, they are less likely to be complied with regardless of the level of ethnic fractionalization. Our logics above are based on collectively bearing the “cost” of complying (or not) with IMF programs. These “costs” are most present in the binding, pecuniary, enforcement of the hard conditions. Collective political will is less important when there are smaller costs to non-compliance. Accordingly, we hypothesize that:

  • H2: Increased ethnic heterogeneity is more likely to weaken the implementation of hard IMF program conditions compared to soft conditions.

We add one additional nuance by suggesting that ethnic heterogeneity is more likely to have an adverse effect on the successful implementation of IMF conditions in countries where ethnic heterogeneity can translate to political fragmentation and diversity. Our theoretical logics apply when the bases of political support are popular. In countries where leaders derive political support from a narrower basis, a “selectorate”, then satisfying that group may be all that is necessary for successfully implementing IMF conditions (Bueno de Mesquita et al., 2003; Nooruddin & Simmons, 2006). If leaders are able to provide these groups with sufficient private goods (irrespective of IMF conditions, or perhaps because of IMF financial support) then a country may be able to implement IMF conditions in order to keep the assistance flowing even in the absence of broader popular support. Accordingly, the logics above may be conditional on political regime type. As such, we hypothesize that:

  • H3: Increased ethnic heterogeneity will weaken the IMF program implementation more in comparatively more democratic regimes.

3 Data and methods

3.1 Model specifications

We utilize the data of 111 countries (see the online appendix, available on the Review of International Organizations’ webpage, for a list of countries) which were in an IMF program during the 1992–2014 period. We set up our data in a manner similar to Dreher et al. (2015) in which the unit of analysis is country-program (i.e., the entire IMF program period for a country), rather than country-year. We estimate:

$$IMF\_{comp}_{it}={\varphi}_i+{\beta}_1{Frac}_{it}+{\beta}_2{Z}_{it}+{\omega}_{i_t}$$
(1)

Wherein, IMF _ compit is the share of conditions which are successfully implemented by country i during the entire IMF program period t (i.e., the entire program, rather than a year). To derive the data on share of implemented IMF conditions, we utilize the information on conditions made available by the IMF’s Monitoring of Fund Arrangements (MONA hereafter) database. This data was only made publicly available by the Fund in 2001. The data lists the number of conditions a country is under in various years since 1992 and the dataset is considered the comprehensive source on IMF conditions. The database also provides information on how many of the imposed conditions were successfully implemented by the recipient country during the period in which that country was under an IMF program.Footnote 6 We focus on two aspects, namely, (i) the total number of conditions country i has implemented during an IMF program period, and (ii) the type of conditions.

As discussed above, we consider “hard” and “soft” IMF conditions. We use the information from the MONA database to compile the share of conditions implemented by country i under each of these categories during its tenure under the IMF program. We carefully compile these conditions from each loan to make sure that the conditions are not double counted since the same conditions appear in multiple category heads. It is also noteworthy that the MONA dataset is not entirely without limitations. For instance, Kentikelenis et al. (2016) and Dreher et al. (2009) suggest that the dataset includes only those conditions which have been reviewed by the Fund’s Executive Board. This means that those programs which have been cancelled or interrupted are not covered in the data. The implication of this would be overstating the compliance rate of conditions. If the probability of cancelled or interrupted programs is non-randomly related to ethnic fractionalization, this could introduce bias into our estimates. Likewise, Mercer-Blackman and Unigovskaya (2004) argue that the MONA data does not do a good job in capturing all the structural benchmark conditions. To address these problems, we also use an alternative dataset on IMF conditions compiled by Kentikelenis et al. (2016) which aims to provide detailed and disaggregated information on all conditions and their implementation sourced from the documents of the Fund’s Executive Board. Results using these data are reported in the robustness test analysis.

The average length of an IMF program in our sample period is 30.4 months, while the maximum and minimum values are 80 and 7 months, respectively. Comoros spent 52 months on average in an IMF program during our study period which is the highest in our sample and Tunisia has the least with 7 months. Figure 1 captures the mean of all conditions and the percentage share of hard and soft conditions implemented by countries during the 1992–2014 period. As seen there, the implementation of hard conditions is always higher than that of soft conditions. In fact, the post-global financial crisis years witnessed a huge gap between the implementation of hard and soft conditions. This is also reflected in the descriptive statistics of both variables in which mean compliance with hard conditions is roughly 62%, compared to 36% for soft conditions.

Fig. 1
figure 1

IMF conditions and Compliance 1992–2013

Frac it-1 captures our main explanatory variable – cultural diversity capturing fractionalization in societies based on ethnic, linguistic, and cultural lines. Our main measure of fractionalization is developed by Alesina et al. (2003). Their objective was to distinguish clearly between ethnic and linguistic heterogeneity. Ethnic and linguistic differences, according to Alesina et al. (2003), were previously lumped together as part of an ethnolinguistic fractionalization measure. Alesina et al. (2003) base their definition of ethnicity involving both racial and linguistic characteristics. For instance, they argue that ethnicity in some of the European and Sub-Saharan African countries is largely based on languages, while the definition of ethnicity for Latin American countries involves a combination of racial and linguistic characteristics. To construct the measure, they collected disaggregated data on 650 ethnic groups for 190 countries from multiple, cross-referenced, sources such as Encyclopedia Britannica (2001), which was the source of the data in 124 of 190 countries along with data from the CIA (2000) for 25 countries, Levinson (1998) for 23 cases and Minority Rights Group International (1997) for 13 cases. While collecting the data, if two or more sources for the index of ethnic fractionalization were identical to the third decimal point, then Alesina et al. (2003) used these sources. If their sources diverged resulting in variance in the index of fractionalization to the second decimal point, they used the source where the reported ethnic groups constituted the greatest share of the total population.

In the robustness tests, we also use a measure of ethnolinguistic fragmentation that was constructed by Fearon and Laitin (2003) (FL measure). Their ethnic fractionalization index is based on data sourced from a Soviet ethnographic atlas which was constructed by a team of 70 researchers in 1960 in the then Soviet Union and printed in the 1964 Atlas Narodov Mira (Atlas of Peoples of the World). This measure gives the probability that two randomly drawn individuals in a country are from different ethnolinguistic groups. Thus, the ethnic fractionalization index will increase with the number of ethnolinguistic groups and will increase with more equally sized groups. It is noteworthy that Fearon and Laitin (2003) filled in values for missing countries in the Atlas of Peoples of the World using various other sources such as CIA Factbook, Encyclopedia Britannica, and the Library of Congress Country Studies to derive the required information on ethnic groups in these missing countries.

The formula used for constructing both Alesina et al.’s (2003) and Fearon and Laitin’s (2003) indices is:

$${Frac}_j=1-{\sum}_{i=1}^N{S}_{ij}^2$$

Where, Sij is the share of group i (i = 1……N) in country j. Note that a higher value represents highly ethnically fractionalized countries and vice-versa, and that both measures are time-invariant in our data. When we look at the descriptive statistics of both measures, we find the correlation to be very high (0.87). While the sample mean of the FL measure is about 0.52, the mean of Alesina et al.’s measure is about 0.51 for our sample of 111 countries. In the case of South-East Asian countries, Alesina et al.’s measure shows more fractionalization than the FL measure while countries from other geographic regions are closer to each other. Given the way Alesina et al.’s measure is constructed, this is not surprising. For purposes of visualization, we map the measure for the 111 countries in our study onto a world map in the online appendix.

The vector of control variables (Zit) includes other potential determinants of IMF program implementation, which we obtain from the extant literature on the subject (Arpac et al., 2008; Gunaydin, 2018; Ivanova et al., 2003; Joyce, 2006). The list of potential control variables is long, but we are aware of the trap of “garbage-can models” or “kitchen-sink models” in which numerous variables are lumped onto the right hand side of the equation, making interpretation of results difficult (Achen, 2005; Schrodt, 2014). We adopt the conservative strategy of accounting only for key factors that affect IMF program implementation, adding several more in the robustness checks. Accordingly, we include two key economic controls. First is the per capita GDP (log) of a recipient country during the program period measured in US$ 2005 constant prices as a proxy for economic performance (Gwaindepi, 2021), sourced from the World Development Indicators (2018). As a crude measure of capacity, we expect countries with a higher level of income will be more likely to implement the IMF conditions (Arpac et al., 2008; Gunaydin, 2018). Likewise, we include a measure of economic crisis (Rewilak, 2018), which is a dummy variable indicating whether a country has experienced one or more of the following crises: systemic banking, currency, and/or debt (Laeven & Valencia, 2008). Once again, the expectation is that worsening economic conditions increase the need for loans from the IMF and therefore also increase the chances of implementing the IMF conditions (Pop-Eleches, 2008). In fact, Sharma (2012) finds that most countries are likely to undertake key economic policy reforms when they face an economic or financial crisis.

We also include important political economy variables which influence program implementation, namely regime type. In addition to our hypothesized relationship on the conditioning role of regime type on the impact of ethnic fractionalization on condition implementation, there are more general avenues by which may impact the relationship. Theoretically, this relationship may run both ways. On the one hand, it is commonly believed that democracies are more likely to maximize national welfare as opposed to autocracies who enrich themselves and their supporters (Joyce, 2006). Therefore, in an unconditional sense, it is likely that higher program implementation is associated with democracy.Footnote 7 Moreover, governments in democracies are often under pressure to show results to the electorate in order to deter opposition parties. However, the contrarian view is that various interest groups will have greater voice and influence in a democratic setup, strengthening opposition and making implementation more difficult for a government (Arpac et al., 2008, Boughton and Mourmouras 2004, Mayer & Mourmouras, 2008, Drazen, 2002). Democracies may also impose more executive constraints making it difficult for the government to enact conditions which are perceived to be unpopular policies (Gunaydin, 2018; Ivanova et al., 2003). To measure the nature of the political regime in power, we include the Polity IV (polity2) democracy index (Jaggers & Gurr, 1995). We subtract the autocracy score from the democracy score, which is standard practice. Thus, the democracy score ranges from +10 (full democracy) to −10 (full autocracy).

We also include a measure of political instability as Mecagni (1999) attributes interruptions in implementing IMF programs to civil and political instability. We use a count of riot incidents during the program period for country i sourced from the Cross-National Time Series Data Archive developed by Banks and Wilson (2018). Political ideology of the government is identified in the literature with successful implementation of conditions (Ivanova et al., 2003). We therefore include a measure for political ideology of the government sourced from the Database of Political Institutions developed by Cruz et al. (2018). This variable is a dummy coded with a value of 1 for left-leaning ideology. Furthermore, we include a dummy measure of legislative elections sourced from Cruz et al. (2018) as previous studies show that implementation of IMF program is sensitive to elections (Arpac et al., 2008; Dreher, 2003; Rickard & Caraway, 2014).Footnote 8 Finally, following Dreher et al. (2015) we include a count of the number of total (hard and soft) conditions imposed on a country in an IMF program as countries with fewer conditions might be less likely to face implementation problems.

It is noteworthy that our unit of analysis (t) is the entire program period for each country in our sample, rather than a year. Therefore, to measure control variables we use the average values of the variables described above for the countries during their respective program periods. The descriptive statistics are provided in and the details on definitions and data sources are provided in the online appendix. We estimate OLS specifications including Huber-White corrected robust standard errors, a method which is robust to heteroskedasticity.

3.2 Endogeneity concerns

While we do not see a reverse causation problem, our ethnic diversity measures could be affected by endogeneity if the IMF factors in ethnic fragmentation of the society when imposing conditions. One could then argue that number, or type, of conditions imposed on a country is in turn determined by the fragmented nature of the society and such ethno-political configurations might increase the government’s bargaining position with the Fund (Bartilow, 1997; Ke, 2012). Ke (2012) suggests that such governments will receive relatively moderate conditions. This could make compliance easier to achieve for the governments either because of the lower number of conditions required to be implemented, and/or the nature of conditions. This would mean that fractionalized societies might end up with higher compliance rates, thereby inflating the (positive) effect of fractionalization. To investigate, we separate the countries in our sample into two categories: those for which both measures of ethnic diversity indices was below the median and those for which their mean was above the median. We do this to investigate whether countries with higher ethnic fractionalization receive fewer conditions, as argued by Ke (2012). If that were to be the case, then one could expect that the compliance rate would be higher among diverse countries. The descriptive analysis from a simple back of the envelope calculation suggests that the conditions imposed during our study period (in both the hard and soft categories) was evenly split among diverse and less diverse countries.Footnote 9To further address this concern, we control for the number of conditions (all, hard, and soft, respectively) in all our models. However, ethnic diversity might be affected by other unobservable factors which could also explain successful implementation of IMF conditions, such as civil conflict (Midtgaard et al(Midtgård et al. 2014), Hartzell et al.,2010, Abouharb & Cingranelli, 2007) or ethnic tensions (Vadlamannati et al., 2014). Failing to account for endogeneity might yield biased results. To address the problem, we employ instrumental variables and estimate a two-stage least squares instrumental variable (2SLS-IV henceforth) estimator.

We use two different measures capturing artificial borders as developed by Alesina et al. (2011) as our instruments. Alesina et al. (2011, p. 246) suggest that “artificial states are those in which political borders do not coincide with a division of nationalities desired by the people on the ground.” The first measure is partition which measures the degree to which ethnic groups in one country were split into two separate countries by borders. This variable is coded on a 0–100 scale wherein the value in between the range denotes the percent share of those ethnic groups that are split into two or more adjacent countries. The second measure is fractal which captures land borders that appear to be a straight line and are therefore more likely to be artificial. The authors use a box-count method developed by Peitgen et al. (1992) to calculate the fractal dimension. Accordingly, the fractal dimension is coded on a 1–2 scale in which a value close to 1 suggests the border to be a straight line. On the other hand, a fractal dimension close to 2 denotes a border resembling a squiggly line. Alesina et al. (2011) suggest that their fractal measure of borders for most countries appears to be closer to 1 than 2 but with some variation. Using both measures, Alesina et al.’s (2011) identify artificial borders as a historical feature that has shaped ethnic fractionalization in some geographic regions and countries across the world.

The validity of the instrument depends on two conditions. First is instrument relevance, which is that the selected instrument must be correlated with the explanatory variable in question – otherwise it has no power. In the case of linear estimations, Bound et al. (1995) suggest examining the joint F-statistic on the excluded instrument in the first-stage regression. As a rule of thumb, an instrument is considered relevant when the first stage regression model’s joint F-statistic is above 10 (Bound et al., 1995). However, the joint F-test has been criticized in the literature as being insufficient to measure the degree of instrument relevance (Stock et al., 2002). More powerful tests, namely the Kleibergen-Paap Wald F-statistic, offer more reliable statistical inferences in a weak instrument setting (Kleibergen & Paap, 2006). An F-statistic above the critical value (10% maximal test size) indicates the rejection of weak instruments. The results from the first-stage regressions are reported in Table 2. We find the expected sign of our selected instruments on our measures of ethnic diversity, which are significantly different from zero at the 1% level. Second, the selected instrument should not be associated with the error term in the second stage of the equation, i.e., [ωit|IVit = 0], meaning the selected instrument should not have any direct effect on the outcome variable of interest – the share of IMF conditions implemented, instead only impacting that outcome via the instrumented variable (Rahman et al., 2019). To the best our knowledge we are not aware of any theoretical proposition or empirical test directly linking artificial borders with compliance of IMF conditions. However, one concern is that the artificial borders might explain compliance with IMF conditions by means other than through ethnic fractionalization. That is, our instrument may be correlated with omitted variables in the model and thereby violating the exclusion restriction criteria. For instance, studies have found that states with arbitrary boundaries experience economic failures and/or are besieged with conflict, instability or institutions (Alesina et al., 2011; Barbour, 1961; Englebert et al., 2002; Griffiths, 1996, 1986; Michalopoulos & Papaioannou, 2016). However, we control for a measure of economic development using per capita GDP (log), a measure of political instability and the Polity index, which serves as a proxy for institutions in our models. In robustness tests we also control for range of other factors like civil conflict, years since independence, elections, among others which might be correlated with our instruments and hence explain our dependent variables. We also apply the Hansen J-test (Hansen 1982) to check for overidentification from the instruments.

3.3 Interaction effects

To examine the third hypothesis, the conditional impact of regime type on the effect of ethnic fractionalization on condition implementation, we estimate interaction models in which we introduce interactions between both measures of ethnic diversity and regime type as:

$$IMF\_{comp}_{it}={\varphi}_i+{\beta}_1{\left( Frac\times polity\right)}_{it}+{\beta}_2{polity}_{it}+{\beta}_3{Z}_{it}+{\lambda}_2+{\omega}_{i_t},$$
(2)

where (Frac × polity)it captures the interaction between both measures of ethnic fractionalization and the polity IV regime type index as described above. Note that we include country-specific fixed effects (𝜆i) in all interaction models specified in Eq. (2). Interacting a time invariant variable with another measure which varies by year allows us control for country fixed effects while the level of fractionalization will be absorbed by the fixed effects.Footnote 10 Furthermore, the models including the interactions in Eq. (2) also account for endogeneity by using an interacted IV estimations. We use the instruments discussed above for ethic diversity measures to estimate our interaction effects.Footnote 11 Combined together, these results are the most rigorous as they allow for controlling for both country fixed effects, to account for unobserved country specific factors which can explain dependent variables, as well as endogeneity concerns via interacted instrumented variables approach. These results are reported in robustness tests (in the online appendix). All interaction effect models specified in Eq. (2) are estimated using the OLS estimator with Huber-White corrected robust standard errors and generate marginal plots to assess the conditional effects.

4 Empirical results

Figure 2 provides a descriptive look at the bivariate relationship between cultural diversity and the share of total and hard IMF conditions implemented. As seen there, the bivariate relationship is negative in both instances, and there is also substantial variation across the range of both measures, suggesting identifying variation is coming from the entire sample. Countries that have higher levels of diversity implement fewer conditions. This relationship holds when we use two different measures of diversity namely, Alesina et al.’s measure in Fig. 2A and C and the FL measure in Fig. 2B and D. These bivariate statistics, however, may simply be spurious correlations. We therefore proceed to examine the statistical relationship in greater detail and precision with multivariate models.

Fig. 2
figure 2

a Diversity & Share of all IMF conditions, b Diversity & Share of all IMF conditions, c Diversity & Share of Hard IMF conditions, d Diversity & Share of Hard IMF conditions

Tables 1, 2 and 3 present our main results. Table 1 presents results from our baseline estimations on the implementation of IMF conditions and the type of conditions, while Table 2 provides results from the IV estimations and Table 3 presents the conditional effects between ethnic diversity and political regime on the implementation of IMF conditions in which country-specific fixed effects are used. We begin our analysis with Table 1. Columns 1–2 present the results for all IMF conditions, while results related to hard and soft IMF conditions are presented in columns 3–6. As seen in column 1, we find a negative and significant effect of ethnic fractionalization on the implementation of all IMF conditions. Notice that the negative and significant effect remains robust to the inclusion of control variables in column 2. Substantively, the results suggest that a standard deviation increase in Alesina et al.’s measure of fractionalization decreases share of IMF conditions implemented by roughly 8%, which is about 36% of a standard deviation of the share of total IMF conditions implemented. These findings support the argument that ethnically diverse societies are less likely to implement IMF conditions. Our results are similar to those obtained by Dollar and Svensson (2000) in their study on World Bank programs. Next, columns 3–6 in Table 1 present the results on the impact of diversity based on the type of IMF condition, with hard IMF conditions in columns 3–4 and soft conditions in columns 5–6. As seen there, the diversity measure is associated with a negative impact on the share of hard IMF conditions implemented, which is significantly different from zero at the 1% level. For instance, a standard deviation increase in our diversity measure reduces the share of implementation of hard conditions by 14%, which is about 45% of a standard deviation of the share of hard IMF conditions implemented. We find no empirical support for the negative effects of ethnic diversity on soft IMF conditions in column 5–6. These results suggest that governments in ethnically fragmented societies find it difficult to implement hard IMF conditions.

Table 1 Impact of Ethnic diversity on implementation of IMF conditions
Table 2 Impact of Ethnic diversity on implementation of IMF conditions – 2SLS-IV
Table 3 Impact of Ethnic diversity, regime type on implementation of IMF conditions

It is noteworthy that our results remain robust to inclusion of several relevant control variables. Political instability, the Polity index, economic crises, political ideology, and number of conditions imposed are the variables which are significantly different from zero at the conventional levels of statistical significance in Table 1. These results are consistent with those reported by previous studies like Dollar and Svensson (2000), Tommasi and Velasco (1996), Laban and Sturzenegger (1994).

In Table 2, we present results from the 2SLS-IV estimations. While column 1 reports the results on all IMF conditions, we repeat the same exercise with hard and soft conditions as dependent variables in columns 2 and 3, respectively. Three observations can be inferred from these results. First, the IV estimation results on ethnic fractionalization in columns 1 and 2 are similar to those reported in our baseline estimates in Table 1. We find a negative and statistically significant effect of ethnic diversity on the implementation of IMF conditions after controlling for endogeneity concerns. Second, not only is our measure of diversity statistically significant, but the impact is also large. For instance, holding other controls constant, a standard deviation increase in the Alesina et al. measure of diversity is associated with a decline in the share of IMF conditions implemented by 21%, which is significantly different from zero at the 5% level (see column 1). The substantive effect in this instance is twice as large as the corresponding OLS estimations in Table 1. These results suggest that any bias stemming from endogeneity leads to understated results using OLS.Footnote 12 As mentioned earlier, we do not see a case for reverse causation and the risk of unobservable factors affecting both the hypothesis variables and dependent variables alike are limited. We already control for most of these factors in our estimations. Nevertheless, the findings from IV estimations provide further credence to the robustness of our results. The Hansen J-statistic shows that the null cannot be rejected at conventional levels of significance. Furthermore, the joint F-statistic from the first stage rejects the null that both the instruments selected are not relevant. In fact, we obtained a higher joint F-statistic and a Kleibergen-Paap F-statistic on all estimation models reported in Table 2, respectively, which are significantly different from zero at the 1% level. Thus, our instruments appear sufficiently strong. Our instrumental variable approach results are also robust to using an alternative set of instruments which are discussed in the next section. Taken together, our results on diversity remain robust to alternative estimation techniques and addressing endogeneity concerns. The results of control variables are roughly the same as reported in Table 1.

In Table 3, we introduce interaction terms between ethnic diversity and regime type as measured by the Polity IV index. While allowing us to evaluate the impact of ethnic fractionalization conditional on regime type, this approach also allows us to account for unobserved country specific characteristic features which might influence our dependent variables via the use of country fixed effects. This is possible as the Polity index varies over time for many of the countries in our sample. This index is interacted with our time invariant measure of ethnic diversity index resulting in an interaction term which varies by country and over time, while the fixed effects absorb the level of ethnic fractionalization. In the robustness tests we also present interaction effect models which control for both country fixed effects as well as endogeneity by using an interacted IV measure. This is discussed further in next section. In column 1, Table 3 we show the interaction results for all IMF conditions, while column 2 reports the interaction effects of the share of hard IMF conditions implemented. As seen in columns 1–2, our interaction terms are negative but statistically insignificant. However, the coefficient of diversity measure on its own, i.e., when the Polity index is equal to 0, is negative and statistically significant at 10% level in the “all conditions” model (1). In contrast, the coefficient of Polity index is positive and statistically significant at 10% level in the “all conditions” model (1). It is important to note that the interpretation of the interaction terms even in linear models is not straightforward. Consequently, a simple t-test on the coefficient of the interaction term is not sufficient to examine whether the interaction term is statistically significant or otherwise. We therefore rely on marginal effects plots. The interactive effect is best assessed with a margins plot which depicts the magnitude of the interaction effect in Figs. 3 and 4. To calculate the marginal effect of the Alesina et al. measure of ethnic diversity on the share of total IMF conditions (Fig. 3) and hard conditions implemented (Fig. 4) respectively, we account for both the conditioning variable (Polity index) and the interaction term and graphically display the total marginal effect conditional on Polity index coded on −10 to 10 scale. The left y-axis in both Figures displays the marginal effect of Alesina et al. measure of ethnic diversity respectively, the right y-axis shows the density of observations at each Polity score, and the marginal effect is evaluated on the Polity index on the x-axis.

Fig. 3
figure 3

Diversity (Alesina et al.) & Regime Type & Marginal Effect on share of IMF conditions met

Fig. 4
figure 4

Diversity (Alesina et al.) & Regime Type & Marginal Effect on share of IMF Hard conditions met

As seen in Figs. 3 and 4, the negative slope of the interaction implies that ethnic diversity decreases the probability of successful implementation of all IMF conditions and hard conditions to a greater degree in more democratic countries, a result in line with hypothesis three. The marginal effects plotted in Fig. 3 (all conditions) suggest the negative effect is statistically significant at the 95% level for countries with a Polity score of 3 or larger, while in Fig. 4 (hard IMF conditions) the effect is only statistically significant at the 90% level for countries with a Polity score of 8 or higher. However, as shown by the overlaid histogram of Polity scores, the increased precision of the estimates for democracies may simply be a function of a greater density of observations at these levels. Indeed, the marginal effect is negative and significant at 10% level, at least for all IMF conditions in Fig. 3, when polity score is between −3 and 10. Collectively, this suggests that while the negative effect of ethnic fractionalization on IMF condition implementation is more noticeable in more democratic countries, the substance of that conditional effect may not be massive. Accordingly, although these results are in line with the findings of Mody and Saravia (2006) that democracy impedes the speed of the agreement of IMF program design, we do not want to read too much into these findings. Notably, we have no data which would allow us to directly test the “collective” mechanisms described in our theory section. Such testing would require individual-level data on sociotropic views which unfortunately goes beyond the scope of this manuscript.

4.1 Robustness checks

We examine the robustness of our findings in several ways. First, we present our main results replacing the Alesina et al. measure of ethnic diversity with the FL measure. Our results, reported in online appendix Table-A, remain robust. We continue to find a negative and significant effect of ethnic fractionalization on the implementation of all IMF conditions as well as hard conditions. The substantive impact shown by both measures of fractionalization (i.e., Alesina et al. and FL measure) is similar and robust. Moreover, the IV estimation results (in Table B, online appendix) on using FL measure are similar to those reported in our baseline estimates in Table 2. For instance, a standard deviation increase in the FL measure is associated with a decline in the share of hard IMF condition implemented by 26%, an effect which is two times larger than the one estimated using OLS in Table-A. Finally, the interaction effect results (in Table-C) are similar when using the FL diversity measure as shown in conditional plots Figs. A and B. These results suggest that our results are robust to using FL measure of diversity.

Second, as discussed in section 3.1, the IMF conditions data from MONA database is not free from limitations. Thus, we rely on an alternative dataset on IMF conditions compiled by Kentikelenis et al. (2016) which provides more detailed and disaggregated information on the implementation of conditions. We make use of two measures. First, we use a simple count of all implemented conditions in all policy areas, divided by the count of all IMF conditions imposed, to capture compliance of all IMF conditions. Second, we use ‘implementation corrected’ hard conditions, divided by the count of hard conditions imposed by the Fund, to capture compliance of hard conditions.Footnote 13 The Kentikelenis et al. (2016) dataset corrects for the implementation of hard conditions by subtracting waivers from hard conditions when applicable.Footnote 14 Furthermore, their dataset also includes conditions in programs which are cancelled or interrupted.Footnote 15 Estimating our baseline models using Kentikelenis et al.’s (2016) data on IMF conditions does not change the results in terms of the sign of the coefficient and the statistical significance. These results are presented in Tables D-F and Figures C-F in the online appendix. The results on both the Alesina et al. and FL measures of ethnic diversity on the share of total IMF conditions, and ‘implementation corrected’ hard conditions, reported in Table-D, remains negative and statistically significant. In fact, the magnitude of the coefficients and substantive effects are similar to those reported in Table 1. The IV estimations, shown in Table-E, are substantively similar although they do not quite reach traditional levels of statistical significance. We also use alternative instruments (discussed below) with the Kentikelenis et al. (2016) data. Both of the diversity measures in the IV estimations, using alternative instruments, become statistically significant at 5% level on all IMF conditions and retain their negative sign. The interaction results in Table-F, and conditional plots in Fig. C-F, support our third hypothesis that ethnic heterogeneity weakens the implementation of IMF program in democratic regimes.

Third, we explore the sensitivity of our results to the use of three alternative instruments. First, we use the historical duration of human settlements measured in 10,000 s years (log) used by Ahlerup (2009). The relationship with ethnic diversity according to Ahlerup and Olsson (2012) stems from the primordial view which contends that ethnic identities have existed since time immemorial or traced back to early civilizations (Smith, 1986). Hence, the primordialists consider that ethnic identification is a natural evolution of human existence. Next, we use an indicator of Fission (log) that measures the genetic distance between six different population groups. Ahlerup and Olsson (2012) then compute the time since each of the six groups split from each other, which is a proxy for duration of human settlement. Finally, we use the distance to Ethiopia measured in miles (log). Assuming an initial settlement in Ethiopia 160,000 years ago, Ahlerup and Olsson (2012) construct the migratory distance from Ethiopia (land of first human origins) to the rest of the countries in the world measured in kilometers. The greater the distance, the longer it takes for the first human settlement of an area and therefore we would expect a negative relationship with ethnic diversity. Once again, we think each of these variables will satisfy the exclusion restriction as none could plausibly directly affect the implementation of IMF conditions. Applying these instrumental variables instead of artificial border measures does not change our IV estimations at all (as reported in Table G-H, online appendix). We find that our instruments are relevant as the first stage regression models’ F-statistics are well above the thumb rule of 10 (Bound et al., 1995) and the Hansen J-statistic suggests the new instruments do not overidentify the models.

Fourth, we replace our explanatory variables namely, the Alesina et al. and FL measures of ethnic diversity with a range of other measures available in the literature. For instance, we use Taylor and Hudson’s (1972) widely used ethnolinguistic fractionalization index, also known as the ELF, constructed using the data of the Atlas Nadorov Mira. Building on Taylor and Hudson’s (1972) measure, Krain (1997) improved the accuracy by recoding the variable as strictly an ethnic fractionalization measure as opposed to ethnolinguistic variable from 1948 to 1982. Finally, we also employ the measure of ethnic fractionalization developed by Montalvo and Reynal-Querol (2005). Once again, our original results remain robust to using these different measures of ethnic diversity (Table I, online appendix).

Fifth, we re-estimate all our estimations in Tables 1, 2 and3 including period fixed effects capturing the time-period of the IMF programs for each country. These results, in Table J-L (and Figure G-J) online appendix, remain robust to inclusion of program-specific period dummies. Sixth, we collapse our dataset into a cross-section where average compliance of IMF program conditionality during the entire study period becomes the dependent variable. Our results remain robust to using a simple cross-sectional analysis (Table M-N, online appendix). Seventh, we include a range of additional control variables to estimate a kitchen sink model, including the ruling party’s majority in the House/Parliament, checks and balances on the executive, the number of years a leader is in the office, polity polarization, and an elections dummy which are sourced from the DPI (2018 version). We also include a dummy measure of civil conflict, trade openness, natural resource rents to GDP, and the number of years in an IMF program. We estimate both OLS and 2SLS-IV estimations with all of these control measures. It is noteworthy that our instrumental variables – artificial borders – could impact compliance of conditionalities via colonial history, years since independence, and number of elections. While country fixed effects in the interaction models with an IV specification (Table-Q, Fig. K-N) control for colonial history, these additional control variables capture some of these factors thereby further reducing concerns on omitted variable bias. Inclusion of these additional variables does not markedly change our baseline results (reported in Table O-P, online appendix).

Next, like Dreher et al. (2015), we disaggregate our estimations by type of conditions. Due to the nature of conditionality, we put Standby Agreements and Extended Fund Facility agreements into one category, while the Structural Adjustment Facility and the Enhanced Structural Adjustment Facility agreements, renamed as the Poverty Reduction and Growth Facility and further modified into the Extended Credit Facility, are grouped into the second category. Our IV estimations (reported in Table-R, online appendix) continue to find the negative significant effect of ethnic diversity on compliance of conditions under both categories. The only exception is compliance of hard conditions under the second category of lending programs where our diversity measures are statistically insignificant. The interaction effects with the Polity index are reported in Table-S and conditional plots in Figures O and P (online appendix). Once again, our results, by and large, remain similar to our baseline estimations. Overall, we find that our results do not differ markedly by lending facility.

Finally, we use a control function approach to test the robustness of our IV approach illustrated earlier. The control function estimator estimates the model of the endogenous regressor (i.e., ethnic diversity index) as a function of our aforementioned instruments to derive predicted residuals which are then included as an additional regressor in the main specification to control for endogeneity (Petrin & Train, 2010). We thus estimate, Frac × polityit = α1ivit + α2Zit + ϑit which gives the residual: \({\hat{\vartheta}}_{it}={Frac}_{it}-{\hat{\alpha}}_1{Z}_{it}-{\hat{\alpha}}_2{iv}_{it}\). Regressing the IMF compliance measures on Fracit, Zit, and \({\hat{\vartheta}}_{it}\) provides control function estimates. The performance of the control function approach hinges on the assumption of having sound instrumental variables to eliminate endogeneity. Moreover, the control function for the ethnic diversity index will only capture endogeneity associated with Fracit and not for other covariates which might also be endogenous. The standard errors for the second stage equation are corrected using a bootstrap approach. The results are presented in Table-T in the online appendix. Notice that the estimates on the diversity measure from the control function estimates are identical to the 2SLS-IV estimates reported in Table 2. However, what is interesting is that the coefficient on the control function (predicted residuals) remains statistically insignificant in Table-T. Evidence of endogeneity is only confirmed if the coefficient is significantly different from zero at conventional levels of significance.

Finally, we are conscious of not overfitting our regression models (Moll & Smets, 2020). To address this problem, we adopt two approaches. First, we drop controls which are statistically insignificant in our models, retaining only those which are significant at conventional levels. Second, we re-estimate all of our models dropping one control variable at a time. The basic results (Table U-V) are not affected when we drop the variables which are statistically insignificant. The robustness check results are not shown here due to brevity but are available in the online appendix. In summary, the results taken together seem robust to using alternative data, specification, instruments, and testing procedure.

5 Conclusions

In this paper we have attempted to add to the understanding of the conditions under which IMF conditions are implemented. Building on a theory that ethnically homogenous societies will be more willing to collectively accept the responsibility for existing macroeconomic conditions, undertake the collective obligations of IMF conditions, and collectively benefit from successful program completion, we investigate if the degree of ethnic fractionalization helps to explain the extent of implementation of IMF conditions, particularly the so called “hard” conditions.

Using an instrumental variable approach, we find robust evidence in support of our hypotheses. Moreover, we find that ethnic fractionalization may hinder implementation slightly more in democracies when compared to autocracies. These findings are generally supportive of those that find that government or societal cohesion are important to the successful implementation of IMF conditions (Dreher, 2003; Nsouli et al., 2005). While noting that our study was limited to countries who had participated in IMF programs and, thus, may not be generalizable to other countries or International Organizations (IOs), they may have several important lessons. First, if compliance is more likely to occur only when the politics of the implementing country is sufficiently non-contentious, then the utility of IOs in enabling credible commitments may be limited by domestic societal and institutional features. This type of mechanism could underlie difficultly in state compliance for any type of IO commitments that could potentially induce economic costs, such as climate or trade commitments in the UNFCC or WTO, respectively. Second, if compliance with IMF conditionality does indeed lead to longer-term economic growth, states with higher degrees of ethnolinguistic fractionalization may risk falling further behind, leading to increased levels of inter-country inequality and further exacerbating tensions in ethnically heterogeneous states. Finally, with specific regard to IMF program success, we suggest that our findings might prompt careful consideration by that body of the ethnolinguistic situation on the design of conditions. Rather than setting up ethnically heterogenous states for failure, the IMF may wish to consider reworking conditionality in these states to achieve better compliance. This might include softening of conditions or adding further conditions that ensure that burden sharing, or that program gains are spread evenly across ethno-linguistic groups. Implementing such recommendations would introduce the very endogeneity we attempted to address with our IV strategy, but that is no reason to avoid what might otherwise be a sound policy approach.

We would note that a major limitation of our study is that, while our theory was built on mechanisms of political support based on individual sociotropic motivations, we did not have data to assess these mechanisms directly. Accordingly, while our results are consistent with those mechanisms, further work would be needed to evaluate those mechanisms directly. This might involve gathering cross-national data on sociotropic feelings toward IMF or other IO programs and would be a useful extension of this work.