1 Introduction

Unified growth theory argues that the child quantity–quality (QQ) trade-off was a key mechanism in the transition from stagnation to modern economic growth (Galor 2011). Before the nineteenth century, economic growth was slow and fertility rates were persistently high. By the mid-twentieth century this situation had reversed in the Western world. The corresponding increase in educational attainment observed during this period presents a narrative which is clearly consistent with various QQ theories. However, the findings in existing studies, which use post-demographic transition population samples, have been frustratingly inconclusive (see Black et al. 2005; Cáceres-Delpiano 2006; Angrist et al. 2010). Moreover, few microeconometric studies have tested the validity of this theory during this vital period.

The primary difference between this study and existing studies is that we use a large sample of individual level data from a historical population experiencing a demographic transition: 1911 Ireland. Ireland’s relatively late fertility transition, alongside the availability of the completed individual census returns for the entire population, make it a viable testing ground for the importance of the QQ effect during the demographic transition.

Using the simple theoretical framework as in Galor (2011), we demonstrate how the child QQ model works, and assess the implications of this model for empirical research. This exercise demonstrates how the child QQ trade-off arises because parents substitute between fertility and child investments in response to relative price, income, and technological changes. It also implies that this substitution happens endogenously, with potential sources of endogeneity stemming from simultaneity, omitted variable bias, and measurement error.

Existing research has employed conventional instrumental variable (IV) estimators in an attempt to solve the endogeneity issue. However, IV estimators chiefly rely on valid exclusion restrictions, which given the highly endogenous nature of the child QQ model are extremely difficult to find in practice. Given the lack of a credible exclusion restriction, this study takes an alternative approach. Instead of using an exclusion restriction dependent IV approach, we adopt the methodology suggested in Millimet and Tchernis (2013), and estimate the QQ model using a number of estimators that either avoid or minimize endogeneity bias but do not require the researcher to make any unverifiable exclusion restriction assumptions. Much like cointegration, vector autoregression, or system/difference GMM, these estimators use alternative modeling assumptions but using cross-section data. These alternative modeling assumptions include substituting bias for variance in a manner similar to the conventional IV regression methodology and identifying information generated from heteroskedasticity in a first-stage modeling equation. Section 4 introduces these estimators in greater detail.

It is widely accepted that Ireland was something of an outlier in the European demographic transition (Ó Gráda 1991). Indeed, a casual look at early twentieth century Ireland suggests that the island’s society was behaving in a way that contradicted the child QQ model. At this time the country had a relatively well-developed national education system, one that was founded in 1831, 39 years ahead of Britain. Illiteracy was rare amongst adults (see Table 1). However, the modernity suggested by Ireland’s education system was untranslated to Irish demographic behavior. A simple comparison of Ó Gráda (1991) and Wilson and Woods (1991) suggests that marital fertility levels were over 1.6 times larger in 1911 Ireland relative to England in the same period. Nevertheless, Ó Gráda (1991) also showed how macro-level, aggregated statistics hid substantial heterogeneity in fertility behavior across Irish regions. When one takes a closer look at these data, it appears that fertility in the urban centers of Dublin and Belfast bore closer resemblance to the lower fertility patterns observed in more developed regions in Western Europe.

The demographic heterogeneity exhibited in Irish regions underlines an important point—aggregated data can obscure important individual-level relationships in any analysis. Our analysis shows how the resolution unit can be vital in QQ-type analyses. If we look at aggregated county-level we find that counties with the lowest fertility also have the lowest school attendance. However, when we analyze more refined, higher resolution, data points, and focus on smaller geographic units inside these aforementioned counties, we find a negative conditional correlation connecting fertility to education—as suggested by the child QQ model. This finding underlines the potential for researchers to commit ecological fallacy when extrapolating results obtained via macro-level data to infer micro-level behavior and motivates the use of high-quality individual-level data in this paper.

This paper asks the following empirical question: do those attending school between the ages of 14 and 16 have fewer siblings, as predicted by the child QQ model? Schooling was effectively compulsory until the age of 14. If the QQ theory holds we expect that those who remain in school for longer to be from smaller families. OLS regressions on our large sample of individual-level data indicate that being in school reduces family level fertility by around 1 %. Interestingly, this effect intensifies as control variables accounting for potentially omitted variables, such as income and socioeconomic status, are included in the model specification. This suggests that any endogeneity bias may be biasing this coefficient towards zero.

To alleviate endogeneity concerns, we estimate a number of models proposed in Millimet and Tchernis (2013). The obvious criticism with these models is that whilst they avoid exclusion restrictions based assumptions, they require researchers to make alternative modeling assumptions. These alternative modeling assumptions either require the practitioner to correctly specify the regression equation or to use the identifying power generated by heteroskedasticity in a regression model of the endogenous treatment on the regressors. With these criticisms in mind, we probe the potential weakness of our empirical approach by conducting an Empirical Monte Carlo analysis as in Huber et al. (2013). This approach simulates “placebo treatments” using real data and thus assesses the performance of the exclusion restriction-free estimators given the causal effect is known. Assuming the presence of endogeneity, this analysis shows that the parametric version of the IV estimator proposed by Klein and Vella (2009) performs best in this application. This analysis also reveals a positive bias in the alternative exclusion-restriction free estimators proposed in Millimet and Tchernis (2013). Thus, we expect the results obtained from these estimators to underestimate (in absolute value) of the child QQ effect in our main empirical analysis.

Our Empirical Monte Carlo results help to contextualize our main results. Compared to the OLS regression results, the endogeneity bias-corrected models indicate a substantially larger QQ effect. The IV estimator based on Klein and Vella (2009), implies that a child staying in school between the ages of 14 and 16 reduces fertility substantially—by around 27 %. As predicted in the Empirical Monte Carlo results, the alternative estimators predict a smaller QQ effect. However, this effect, of around 10 %, is still significant. These results demonstrate the presence of a positive endogeneity bias. Our results are robust to different definitions of fertility (i.e. surviving of births), and potential issues of sample selection caused by maternal mortality.

Our findings illustrate the importance of human capital as a factor that underpinned demographic change in historical societies. These results appeal to the wider literature on long-run economic growth and technological progress (Galor and Weil 2000; Galor and Moav 2002). Until recently, few research papers (notable exceptions include Becker et al. (2010) and Klemp and Weisdorf (2016)) have investigated this empirical link. The implication of our findings for the history of economic development is that the fertility transition was accelerated by parental investments in children, and therefore human capital, in Western economies in the late nineteenth and early twentieth centuries.

2 Theory and existing literature

Trends in economic growth and fertility during the period 1850–1950 presented a puzzle to economic demographers. Classical theory, typically attributed to Malthus, assumes that children are a normal good, and thus the income elasticity of parent’s demand for children is positive. However, the emergence of modern economic growth alongside the demographic transition was at odds with classical theory. To answer this puzzle, Becker and Lewis (1973) proposed an extension to the classical economic model of fertility, and argued that parents made optimal (utility maximizing) child-rearing decisions across not one, but two dimensions: quantity and quality. Quantity refers to the number of children, whereas the meaning of quality can be loosely defined in terms of child-resources devoted human-capital augmenting education.

Since Becker and Lewis (1973)’s seminal contribution, the theoretical framework of the QQ model has been expanded, as in Becker and Tomes (1976). Several key contributions in the macroeconomic growth literature have cited this trade-off as the vital mechanism that fostered the emergence of sustained economic growth in Western economies. These citations include research by Galor and Weil (1999, (2000), and Galor and Moav (2002), who argued that an endogenous relationship between technological growth, the demand for human capital, and fertility emerged in the second phase of the Industrial Revolution. It is worth noting that a fundamental difference exists between the Becker-style QQ model and recent contributions (Galor 2012). In Becker’s model, increased levels of income stimulate a decline in fertility via a substitution effect. Alternatively, parental preferences are such that the income elasticity with respect to quality is higher than with respect to quantity at higher levels of income. This is not the mechanism proposed in Galor and Weil (1999, (2000), and Galor and Moav (2002), who argued that the future return on their offspring’s human capital caused parents to choose child quality over quantity. In essence, technological growth stimulated an economic expansion which drove an increase in the demand for human capital, that in-turn decreased fertility via the QQ trade-off.

As in Galor and Moav (2002), assume households maximize the following log-linear utility function:

$$\begin{aligned} u = (1-\pi ) \ln c + \pi [\ln n + \omega \ln h], \end{aligned}$$
(1)

where n is the number of surviving children, h is the quality (a measure of human capital) of those children, and c represents all non-child consumption goods. The constant preference parameters \(\pi \in (0,1)\) and \(\omega \in (0,1)\) represent the household’s preferences for children (over other forms of consumption) and child quality respectively. Each household is endowed with one unit of time, which they divide between labor market activities and child-rearing. There are two costs associated with child rearing, \(\nu ^{n}\) is the fraction of a household’s time dedicated to children ignoring child quality investments, while \(\nu ^{e}\) is the cost (in time) associated with each unit of education e. If a household dedicates all its time to labor market activities it will generate an income y. Child-rearing represents time away from work, thus the price of each child is the opportunity cost associated with rearing. The budget constraint is therefore:

$$\begin{aligned} yn(\nu ^{n}+\nu ^{e}e) + c = y, \end{aligned}$$
(2)

assuming the budget constraint binds. Additionally, it is assumed that human capital h is an increasing function of both education e and technological progress g:

$$\begin{aligned} h = f(e,g). \end{aligned}$$
(3)

Taking Eqs. (1)–(3), the household’s optimization implies the following Walrasian demand functions for fertility and child quality:

$$\begin{aligned} \nu ^ef(\cdot )&= \omega f_e(\cdot )(\nu ^n+\nu ^ee), \end{aligned}$$
(4)
$$\begin{aligned} n&= \frac{\pi }{\nu ^n + \nu ^ee }. \end{aligned}$$
(5)

Equation (5) illustrates the negative relationship linking education and fertility, as \(\partial n/\partial e < 0\). Additionally, we can see that the optimal level of education is determined both by child rearing and education prices alongside technological progress. The functional form of f(eg) is also a factor for determining the optimal level of education in Eq. (4).

The above model illustrates the basic mechanisms through which the child QQ model works and predicts why, at the micro level, families who choose to invest in their child’s education have lower fertility. This model also highlights a number of econometric issues. Firstly, the family’s education and fertility decisions are jointly determined. Multiple factors (prices, technology, and preferences) cause the QQ trade-off. This introduces the potential for simultaneity bias. Secondly, our empirical data does not include all of the relevant (and largely unobservable) variables in the QQ model. Instead, we use proxy information. For example, in our application we observed whether a child is in school and use this information as a proxy measure for education. This variable does not take into account the quality of school. Therefore, our data can be seen as a somewhat noisy representation of the true variable, and thus we suspect measurement error and attenuation bias. Thirdly, we expect there are exogenous factors simultaneously correlated with both child quantity and quality. This introduces the issue of omitted variable bias. In the Irish context urban status is negatively related to both fertility and education, a pattern that runs contrary to the QQ model. However, this effect stems from occupational differences specific to our data rather than the QQ model. In urban areas children are less likely to stay in school because greater opportunities for them to work outside the home exist.

The typical approach in empirical research has been to estimate one or both of the econometric models:

$$\begin{aligned} e_i&= n_i\alpha _1 + {\mathbf {X}}_{1i}\varvec{\beta _1} + u_{i} \end{aligned}$$
(6)
$$\begin{aligned} n_i&= e_i\alpha _2 + {\mathbf {X}}_{2i}\varvec{\beta _2} + v_{i} \end{aligned}$$
(7)

where the data variables contained in \(\mathbf{X }_{1i}\) and \(\mathbf{X }_{2i}\) consists of both control (the same variable is in both \(\mathbf{X }_{1i}\) and \(\mathbf{X }_{2i}\)) and instrumental (otherwise) variables. The variables \({e}_{i}\) and \({n}_{i}\) are as before and the coefficient parameters are: \(\varvec{\beta _1}\), \(\varvec{\beta _2}\), \(\alpha _1\), and \(\alpha _2\). The \(u_{i}\) and \(v_{i}\) error terms measure unsystematic variation. Essentially, these are linear approximations to the structural QQ equations and the QQ effects are measured by the \(\alpha _1\) and \(\alpha _2\) parameters.Footnote 1

Estimating Eqs. (6) and/or (7) is complicated by endogeneity concerns, as previously discussed. Instrumental variables offer a popular solution to such concerns. In the context of the models expressed above this means including a variable in \(\mathbf{X }_{1i}\)/\(\mathbf{X }_{2i}\) that is excluded from \(\mathbf{X }_{2i}\)/\(\mathbf{X }_{1i}\). In this paper we are interested in estimating the causal effect of education on fertility. To estimate this QQ model using conventional IV via 2SLS or equivalent requires knowledge of a variable (or variables) that (monotonically) cause differences in education and only effects fertility through this education channel—the exclusion restriction.

Rosenzweig and Wolpin (1980) were the first to apply an exclusion restriction and IV methodology to this question. They argued that multiple births (like twins), for a given parity, represent an exogenous increase in a family size. Using a sample of Indian households, their findings supported the QQ mechanism.

A multiple birth IV methodology was also performed in Black et al. (2005) who used a population sample of administrative records from Norway to estimate the effect of family size on school attainment. Interestingly the Black et al. (2005) findings suggest no relationship between a family’s investment in education and fertility. A more recent study by Angrist et al. (2010) reached a similar conclusion based on Israeli data. In addition to multiple births, Angrist et al. (2010) used another IV based on the gender composition of the first and second born children. The validity of this instrument is justified on the basis of a parental preference for a mixed gender composition. Cáceres-Delpiano (2006) also used a multiple birth instrument to study the effects of family size using a sample of US children and found mixed support in favor of the QQ trade-off.

The validity of IV-based QQ estimates depends on the extent to which the exclusion restriction holds. Rosenzweig and Zhang (2009) question the use of multiple-birth instruments because twin births can cause an intra-household re-allocation of childhood resources. This intra-household re-allocation can obscure QQ effects, since parents appear to divert resources away from twin births to older siblings. Galor (2012) echoes these concerns, arguing that true tests of the QQ trade-off require either an exogenous change in the price of quantity or a change in the return to quality. Since multiple births impose a non-optimal level of household fertility, the response will not necessarily cause a family to deviate from the optimal level of quality. In essence, empirical testing of the QQ effect should examine how optimal levels of child quality respond to shocks in the exogenous variables.

Few studies have tested the QQ trade-off in historical populations. This is largely due to the scarcity of data containing the relevant empirical variables for a large number of observations. However, there are some exceptions. A study by Bleakley and Lange (2009) used the eradication of hookworm disease in Southern US states as a form of natural experiment. Bleakley and Lange (2009) argued that the eradication of this disease reduced the cost of child quality, and the subsequent increase in education and decrease in fertility were consistent with the QQ framework and also unified growth theory. Hatton and Martin (2010) use a unique individual level sample of British children in 1937–1939 to measure the relationship between family size and height, which serves as a proxy for human capital (health). Their results are also consistent with the QQ hypothesis as they show how family size was a key determinant of height. Another study by Klemp and Weisdorf (2016) looked at the relationship between fertility and literacy in historical England. Using exogenous variation caused by fecundity differences, Klemp and Weisdorf (2016) found that increases in sibship size caused reductions in adult literacy. Using aggregated regional data for mid-nineteenth century Prussia, Becker et al. (2010) approach the QQ trade-off from a macro perspective. Their analysis strongly supports the presence of a QQ trade-off. This macro-perspective finding also holds in nineteenth century France (Murphy 2015). The presence of an important child QQ effect is also supported in Galor and Klemp (2013). Galor and Klemp (2013)’s findings show that the process of natural selection appears to favor those with lower levels of fecundity in historical Quebec.

Given the the importance of the budget constraint in the child QQ trade-off the income-fertility relationship also holds relevance. There are a number of research papers that add context to the discussion of the economic determinants of historical fertility. Clark and Hamilton (2006) found a substantial fertility-income gradient in pre-industrial England. This finding was echoed in work by Boberg-Fazlic et al. (2011) who like Clark and Hamilton (2006) found a positive income-fertility relationship in pre-1800 England. Cinnirella et al. (2013) showed how this mechanism operates in the short run, as families in pre-industrial England respond to short-term economic stress by increasing birth spacing thus lowering fertility.

In summary, there is a substantial literature on the child QQ trade off. However, this paper is unique as it estimated the child QQ-effect using a large sample of individual-level data in a historical context. Furthermore, this paper employs a series of econometric estimation procedures that are designed to alleviate the concern of endogeneity and thus does not rely on exclusion restriction based IV methods which have been questioned by others in this literature.

3 Data and context

Our study uses the individual level data returns from the 1911 Census of Ireland. The National Archives of Ireland provide full and unrestricted access to these returns. A key feature of these data is, as in Great Britain, all married women were asked how many children they had given birth to and also how many of these children had survived. The Registrar General was responsible for the collecting the census with the Irish police force served as enumerators. By 1911, the Irish police force had experience enumerating previous decennial censuses, alongside other enumeration duties like the collecting agricultural statistics (Guinnane et al. 2001). Completion of the census form was a simple procedure. Given the high level of literacy in Ireland at the time, it is reasonable to expect that most households contained at least one member capable of completing the enumeration form, although the use of these data entails some caveats. For example, whilst married women had little incentive to lie about child fertility or mortality, these data, like most self-reports, may, in some cases, contain some inaccuracies.

Lee (1969) illustrated how the Old Age Pensions Act of 1908 caused intentional age misreporting in 1911. Consequently, the age distribution for latter age-cohorts is skewed. However, this age-misreporting only occurred in older age cohorts (a large number of individuals claimed to be 73) and therefore is of little relevance to our research question. Guinnane et al. (2001) used these data to examine fertility behavior in Dublin city. As part of their analysis, Guinnane et al. (2001) examined age-misreporting by linking individual records in 1911 with early records in 1901. The results of this exercise indicated that limited age exaggeration occurred amongst women claiming to be under-50 in 1911. Furthermore, this analysis did not reveal any substantial misreporting biases after the socioeconomic stratification of individuals. Apart from the aforementioned age-misreporting, historical demographers regard these data as sufficiently accurate for research (Watterson 1988).

One previous study used the individual returns from the Irish census in their entirety (Fernihough et al. 2015), although research has been conducted on samples.Footnote 2 For example, Guinnane (1997) used a sample that linked the 1901 and 1911 censuses to look at family formation, the age of leaving home, and other demographic issues. Ó Gráda (2006) profiled demographic aspects of Dublin’s Jewish community.

This paper uses school attendance as a measure of human capital. Educational attainment was an important determinant of social advancement in early twentieth century Ireland (Daly 1982). Entry into most clerical professions was contingent on school qualifications. Ireland’s education system was comparatively quite advanced by 1911. Following the Irish Education Act of 1892, school fees for the majority of national/primary schools were abolished, while the same act also introduced compulsory school attendance for all children between the ages of six and fourteen (Buachalla 1988). However, mandatory attendance could be circumvented for children aged twelve and older, provided they had found a source of regular paid employment (Patterson 1985).

The latter part of the nineteenth and early twentieth century saw a growth in secondary school attendance in Ireland. The introduction of state-organized exams in 1879 did much to foster this growth because a successful exam certificate was a valuable qualification for those wishing to join either the civil service or army (Coolahan 1981). A combination of both monetary and opportunity costs restricted the poorest from graduating to secondary level, although there were many exceptions. Religious bodies, particularly the Catholic Christian Brothers, built a substantial network of secondary schools enabling a large number of children from the poorer families to attain secondary-level education. There appeared to be significant returns to education in 1911 Ireland. However, this additional schooling should have implications for a family’s budget constraint, as suggested by the child QQ model.

Fig. 1
figure 1

Marital fertility, child-married woman ratio. a County-level resolution. b DED-level resolution

Figure 1 shows the spatial distribution of martial fertility in 1911 Ireland at two levels of resolution. Like Becker et al. (2010) we use the child-married woman ratio to measure marital fertility although these results are almost identical when we use child-woman ratios. This variable is defined as the number of children aged 0–4 divided by the number of married women aged 15–49. Plot (a) illustrates the spatial distribution of marital fertility aggregated at the county (32 national subdivisions) level. An east-west marital-fertility difference is apparent. This difference is underlined by the urban-rural split. The eastern seaboard contains large urban populations in the cities of Dublin and Belfast (nestled between counties Antrim and Down). The west of Ireland was very rural and hosted little economic activity outside the traditional agricultural sector in 1911. Guinnane (1997) showed that whilst rural fertility did fall in response to the Great Irish Famine (1845–1852), this largely occurred through celibacy rather than within marriage. Nevertheless, Fig. 1 demonstrates that there must have been fertility control within marriage too, as we find large between county marital fertility differences, as in Ó Gráda (1991).

Plot (b) on the right-hand side of Fig. 1 shows marital fertility again, although aggregated at a much higher resolution. The regions here are the smaller district electoral divisions (DEDs). There are 3655 DEDs in these data. Overall, the marital fertility pattern similar to the county-level data shown in plot (a)—the areas with the highest levels of marital fertility are in the rural west and the lowest in the more urbanized east. However, a large degree of within-county heterogeneity is also evident. For example, large counties like Cork (in the south) have a mix of both high and low levels of marital fertility.

Fig. 2
figure 2

School attendance for children aged 14–16 years. a County-level resolution. b DED-level resolution

In this paper we use the individual returns to infer school attendance. The population were surveyed on their occupation and those attending school were typically enumerated using a description of “Scholar”. School attendance was effectively compulsory up to the age of 14, so we focus on school attendance amongst children between 14 and 16 years of age. Figure 2 repeats the analysis displayed in Fig. 1 with school attendance. Unlike, Fig. 1 the east-west dichotomy is not as evident. School attendance appears to be lower in the northern counties of plot (a). This is possibly reflects labor market conditions as, the more industrial northern counties would have had more opportunities for full-time employment outside seasonal agricultural employment. The opportunity costs associated with education were not constant across Irish counties.

The difference between plots (a) and (b) in Fig. 2 is also striking. The importance of within county heterogeneity was highlighted in Fig. 1 although it appears that this heterogeneity is even more prevalent in Fig. 2. Within counties some DEDs have high levels of school attendance whilst others do not. This potentially reflects the local supply of schooling.

Fig. 3
figure 3

Marital fertility and school attendance scatterplots. a County-level resolution. b DED-level resolution

Figure 3 illustrates the bivariate relationship between aggregated marital fertility and school attendance at both the county and DED level. Plot (a) shows the relationship at the county-level resolution. Contrary to the child QQ model, the correlation is positive in this plot. However, this relationship is reversed in plot (b) as we find a negative conditional relationship connecting fertility and education. This conditional relationship accounts for county-level fixed effects which may simultaneously affect fertility and education decisions.

The difference between plot (a) and (b) in Fig. 3 demonstrates the potential for researchers to commit ecological fallacy when trying to infer individual’s behavior from aggregated data. For example, fertility in a county could be low and school attendance high, so the ecological inference here would be that families who send their children to school have less children. However, this assumes that a family from a county where fertility is low will also have low fertility—which might not be the case. The DED-level maps highlight a considerable degree of within-county heterogeneity. Therefore, it is perfectly plausible for the QQ relationship to hold at the individual level, but not the macro level. An example of this occurs in U.S. politics, where on average, wealthier people vote republican, but wealthier states vote democrat.

Previously Brown and Guinnane (2007) illustrated the importance of using disaggregated data in their study of the fertility transition in nineteenth century Germany and their findings appear to resonate here. Thus the ecological distinction motivates our use of individual-level data. These census data are cross-sectional, without retrospective information on either completed family size or education. However, since these data contain the entire population, the number of observations is large enough to make accurate inferences using this cross-section. Specifically, we evaluate whether school attendance is an important determinant of family size. School attendance contains students at both primary and secondary level as it was not uncommon for students to repeat their latter-primary years in order to obtain a more impressive school certificate (Parkes 2010, p. 50). While we cannot observe the standard or level of schooling, it is reasonable approximation to assume that school attendance in these age-groups is an accurate indicator of family-level investment in education.

The observation unit in this analysis is all individuals between 14 and 16 years of age. The education variable is the aforementioned school attendance indicator. The census surveyed each married woman’s fertility, asking for information on the number of births and surviving children. We have the entire census returns at our disposal and thus we match each 14–16 year old to the fertility information provided by their mother. The child QQ theory infers that parents investing more resources towards the human capital of their children causes decreases in family size. Lacking information on completed family size or education, this paper uses the census cross-section to take a snapshot of the Irish population in 1911. The dependent variable is fertility and the regressor of interest is a dummy variable indicating whether or not the observation is reported as a scholar.

In this model fertility is a function of schooling. However, the model is not temporally inconsistent because fertility is determined in the past and school attendance is recorded in the present. It is important to view the child’s school attendance as an accumulation of human capital investments. For example, for a child to remain in school at age 14 they are highly likely to have been in school for several years previously. Thus their presence in schooling provides a suitable indicator for the household’s investment in child quality over a long time frame. In the context of the child QQ model we see this as being a suitable variable for \(e_i\)—the time-invariant measure of education for an individual i.

The data we use contain a number of features which need to be processed prior to our econometric analysis. These census data represent one cross-section of the population at one specific point in time. Thus, we do not have information on completed schooling or fertility. A child in school at age 14 may have left a month after completing the census or remained in school for another 2 years. This is not information we have access to. The family-level fertility we observe relates only to one day in 1911. A mother reporting low fertility may go on to have more children than stated in the census. Also excluded are the children enumerated in residences where the mother is absent (through death or travel) and children with dead fathers, as the census only surveyed currently married women, not widows. We also omit a small number of families who live in multiple family households (as the family connections are two difficult to untangle) and children where the data are suspected to be inaccurate.Footnote 3

Fig. 4
figure 4

Analysis sample comparisons. a Sample size comparisons. b School attendance and age

Table 1 Summary statistics

The left-hand panel in Fig. 4 illustrates the difference in sample size between the full census and the observations used in our analysis. Most of the observations trimmed from the full census are individuals for whom we are unable to establish a maternal family link.Footnote 4 The obvious concern with using a trimmed population sample is that this sample is unrepresentative of the population. However, as shown in the right panel, the % of children in school in our analysis sample is almost identical to the full census. Both follow the same decline trajectory between the ages of 14 and 16. We do not have any data on the family-level fertility of trimmed observations although these data are likely to be unrepresentative of fertility choice in the child QQ model anyway because they are from families for whom the completed family size choice has most likely been curtailed (due to the absence of a parent).

Table 1 contains the summary statistics for our data. We use two measures of fertility in this paper: net and gross. Net fertility is the logged number of surviving children reported by the mother in the family, whereas gross fertility is the logged total number of births. Our preferred outcome measure is net fertility, but we will also use gross fertility to demonstrate that our results are robust. We use logged measures so as to remove skewness from the distribution, but the results of the proceeding analysis are the same regardless of the unit of measurement. The average number of surviving children for each family is \( \exp (1.771) \approx 5.877\). Roughly half of the children in the analysis sample are declared scholars in the census, which tallies well with the right-hand panel in Fig. 4. The remaining rows in Table 1 show the demographic and socioeconomic control variables used in this study. We are able to use information on the gender of each child, their parent’s literacy, whether they had servants in the house (an indicator for wealth), religious affiliation, and parental labor force information.

4 Empirical models

If the child QQ model holds, we expect within an economy each household’s investment in education to be negatively correlated with their level of fertility. In our data, where individuals are aged between 14 and 16, those in school should be from households with lower levels of fertility. Consider the potential outcomes framework where \(n_{i}(e)\) indicates the household fertility level \(n_i\), which depends on a school “treatment” variable \(e \in \{0,1\}\). The causal effect of an individual being in school (\(e=1\)) on fertility is thus:

$$\begin{aligned} \tau _i = n_{i}(e=1) - n_{i}(e=0), \end{aligned}$$
(8)

the difference in potential outcomes. In this application we are interested in estimating the expected value of these potential outcomes or the average treatment effect (ATE):Footnote 5

$$\begin{aligned} \tau _{ATE} = E[\tau _i] = E[n_{i}(e=1) - n_{i}(e=0)]. \end{aligned}$$
(9)

In practice, we cannot perform the comparison in Eq. (9) as we only observe \(e_{i} = 0\) or \(e_i=1\).

In our application we estimate the ATE (\(\tau _{ATE}=\tau \)) using the following system of equations:

$$\begin{aligned} n_{i} =&\mathbf {X}_{\mathbf{i}} \varvec{\beta } + \tau e_{i} + v_{i},\end{aligned}$$
(10)
$$\begin{aligned} e^*_{i} =&\mathbf {X}_{\mathbf{i}} \varvec{\gamma } + u_{i}, \end{aligned}$$
(11)
$$\begin{aligned} e_{i} =&{\left\{ \begin{array}{ll} 0 &{} \quad \text {if } e^*_{i}>0\\ 1&{} \quad \text {otherwise} \end{array}\right. }, \end{aligned}$$
(12)

where \(n_i\) and \(e_i\) are as before, \(\mathbf {X}_{\mathbf{i}}\) is a vector of observable covariates, and \(v_i\) and \(u_i\) are two error terms. The conditional independence assumption (CIA) stipulates that the error terms are uncorrelated: \(E(v_i u_i) = 0\). The CIA implies that if one was to estimate Eq. (10) via OLS the estimated parameter \(\tau \) would not be subject to endogeneity bias. In the previous section we discussed potential sources of endogeneity in the QQ model and in practice we suspect that these error terms are related: \(E(v_i u_i) \ne 0\). Consequently, we assume that these errors originate from the following bivariate normal distribution:

$$\begin{aligned} \begin{pmatrix} v_i \\ u_i \end{pmatrix} \sim {\mathcal {N}} \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} \sigma _{v}^2 &{} \rho _{vu} \\ \cdot &{} 1 \end{pmatrix} \right) , \end{aligned}$$
(13)

where \(\rho _{vu}\ne 0,\) indicates endogeneity bias. The normalized inverse probability weighted (IPW) estimator of Hirano and Imbens (2001) offers an alternative to OLS regression. The advantage of propensity-score based estimators is that they are less reliant on the linear functional form. The IPW estimate of the ATE is given by the following formula:

$$\begin{aligned} {\hat{\tau }}_{IPW, ATE} = \left[ \frac{\sum _{i=1}^N n_is_i/{\hat{P}}(\mathbf {X}_{\mathbf{i}} )}{\sum _{i=1}^N s_i/{\hat{P}}(\mathbf {X}_{\mathbf{i}} )} \right] - \left[ \frac{\sum _{i=1}^N (n_i(1-s_i))/(1-{\hat{P}}(\mathbf {X}_{\mathbf{i}} ))}{\sum _{i=1}^N (1-s_i)/(1-{\hat{P}}(\mathbf {X}_{\mathbf{i}}) )} \right] , \end{aligned}$$
(14)

where \({\hat{P}}(\mathbf {X}_{\mathbf{i}} )\) represents the propensity score estimates obtained, in this application, from the predicted values of a probit regression fitted using the latent model expressed in Eqs. (11) and (12). Like OLS, the IPW estimator works under the assumption that the CIA holds.

Black and Smith (2004) recommend a trimmed sample version of the the IPW. Black and Smith (2004) derive a bias-minimizing propensity score for the ATT: \(P(\mathbf {X}_{\mathbf{i}})=0.5\). Based on this, the authors recommend a propensity based estimator with a thick support region where researchers only use data points for which: \(P({\mathbf {X}}_i) \in (0.33,0.67)\). Millimet and Tchernis (2013) advance this methodology by deriving the bias minimizing propensity score for the ATE. Unlike the the ATT, the bias-minimizing propensity score of the ATE is is not fixed at \(P({\mathbf {X}}_i) =0.5\). Instead it depends on the nature of the selection into the endogenous treatment (in our application schooling \(e_i\)). This bias is expressed:

$$\begin{aligned} B_{ATE}[P(\mathbf {X}_{\mathbf{i}})] = -\left( \lambda _{0} + [1-P(\mathbf {X}_{\mathbf{i}})](\lambda _{1}-\lambda _{0}) \right) \left( \frac{\phi (\mathbf {X}_{\mathbf{i}} \varvec{\gamma })}{\Phi (\mathbf {X}_{\mathbf{i}} \varvec{\gamma })[1-\Phi (\mathbf {X}_{\mathbf{i}} \varvec{\gamma })]} \right) , \end{aligned}$$
(15)

where \(\phi \) and \(\Phi \) represent both the probability and cumulative density functions of the standard normal distribution respectively, whilst the selection parameters \(\lambda _0\) and \(\lambda _1\) can be estimated from the following Heckman BVN selection model:

$$\begin{aligned} n_{i} =&\mathbf {X}_{\mathbf{i}} \varvec{\beta } + \tau e_{i} + \lambda _{0}(1-e_{i}) \left[ \frac{\phi (\mathbf {X}_{\mathbf{i}} \varvec{\gamma })}{1-\Phi (\mathbf {X}_{\mathbf{i}} \varvec{\gamma })} \right] + \lambda _{1}(e_{i}) \left[ \frac{-\phi (\mathbf {X}_{\mathbf{i}} \varvec{\gamma })}{\Phi (\mathbf {X}_{\mathbf{i}} \varvec{\gamma })} \right] + v_{i}, \end{aligned}$$
(16)

which in practice we estimate be replacing \(\varvec{\gamma }\) with sample estimates from a first-stage probit model. Note that the parameter \(\tau \) in the regression model in Eq. (16) is the BVN estimate of the ATE parameter. If we replace the \(\lambda _0\), \(\lambda _1\), and \(\varvec{\gamma }\) parameters in Eq. (15) with their sample analogues, we can find the bias-minimizing propensity score (\(P^*\)), by performing a grid search.

Once \(P^*\) has been estimated this allows us to estimate a minimum biased (MB) version of the IPW:

$$\begin{aligned} {\hat{\tau }}_{MB, ATE} = \left[ \frac{\sum _{i \in \Omega } n_is_i/{\hat{P}}(\mathbf {X}_{\mathbf{i}} )}{\sum _{i \in \Omega } s_i/{\hat{P}}(\mathbf {X}_{\mathbf{i}} )} \right] - \left[ \frac{\sum _{i \in \Omega } (n_i(1-s_i))/(1-{\hat{P}}(\mathbf {X}_{\mathbf{i}} ))}{\sum _{i \in \Omega } (1-s_i)/(1-{\hat{P}}(\mathbf {X}_{\mathbf{i}}) )} \right] , \end{aligned}$$
(17)

where the sample \(\Omega \) is a trimmed version of the full sample consisting of data points in some neighborhood around \(P^*\): \(({\underline{P}}, {\overline{P}})\). Observations in this sample are removed based on their proximity to \(P^*\), i.e. if \({\hat{P}}(\mathbf {X}_{\mathbf{i}})\) is either below \({\underline{P}}\) or above \({\overline{P}}\). Observations further away from \(P^*\) are more likely to be discarded in the analysis. In essence, the MB estimator trades bias for variance. The amount of bias that needs to be traded off at the expense of variance requires the researcher to make a subjective decision about the “support region”. If the support region is too wide, the sample size will also remain large but this will limit the extent to which the bias is reduced. If the support region is too narrow, many observations will be discarded from the analysis and the estimated ATE will be less precisely measured and this will result in larger standard errors. Millimet and Tchernis (2013) recommend that \({\underline{P}}\) and \({\overline{P}}\) are chosen on the following basis:

$$\begin{aligned}&{\underline{P}} = \text {max}\{0.02, P^*-\kappa _{\theta }\}\end{aligned}$$
(18)
$$\begin{aligned}&{\overline{P}} = \text {min}\{0.98, P^*+\kappa _{\theta }\} \end{aligned}$$
(19)

where \(\kappa _{\theta }\) is the smallest value such that at least \((\theta \times 100~\%)\) of the treatment and control groups are in the trimmed sample. The lower one is willing to set \(\theta \), the lower the bias MB estimate of the ATE, albeit at the expense of a higher sampling variance. Millimet and Tchernis (2013) recommend setting \(\theta \in \{0.05, 0.25\}\), and comparing, advice we adhere to in this paper.

Given that Eq. (15) allows us to estimate the endogeneity bias, we can use this information to create a series of bias corrected (BC) estimators. For example:

$$\begin{aligned} {\hat{\tau }}_{IPW-BC, ATE} = {\hat{\tau }}_{IPW, ATE} - \sum \widehat{B_{ATE}[{\hat{P}}(\mathbf {X_i})]} \end{aligned}$$
(20)

where \(\widehat{B_{ATE}[P^*]}\) is the sample estimate of the bias as in Eq. (15). This procedure can be also used to correct the MB estimators.

The final estimator we utilize is a parametric version of the IV estimator proposed in Klein and Vella (2009). This estimator exploits heteroskedasticity in the school treatment Eqs. (11) and (12) to estimate the causal impact of staying in school on fertility in Eq. (10). The rationale of this estimator is based on a wider literature that uses the heteroskedasticity for identification of endogenous regressor models (Klein and Vella 2010; Lewbel 2010; Hogan and Rigobon 2002). We reform the aforementioned probit model used to estimate \(\varvec{\gamma }\), and instead estimate the following heteroskedastic probit model:

$$\begin{aligned} \text {Pr}(e_{i}=1 |\mathbf {X}_{\mathbf{i}} ) = \Phi \left( \frac{\mathbf {X}_{\mathbf{i}} \varvec{\gamma }}{\exp (\mathbf {X}_{\mathbf{i}} \varvec{\delta })}\right) \end{aligned}$$
(21)

as in Greene (2008, pp. 788–789). The maximum likelihood estimates in Eq. (22) \({\hat{P}}(\mathbf {X}_{\mathbf{i}})\) can then be used as a valid IV for \(e_{i}\) in Eq. (10), which we estimate using two-stage least squares.

Given the potential efficacy of the Klein and Vella (2009) estimator it is worth highlighting how it identifies the ATE parameter without an exclusion restriction. We have already seen, in Eq. (16), how the ATE can be estimated from Heckman’s BVN model without an exclusion restriction. This is known as “identification by functional form” because it relies on the bivariate normal functional form being correct to achieve identification. This functional form-based identification also works in the Klein and Vella (2009) estimator when there is no heteroskedasticity in the education equation, i.e. \(\exp (\mathbf {X}_{\mathbf{i}} \varvec{\delta })=1\). The probit’s nonlinear CDF means that the predicted values for education \({\hat{P}}(\mathbf {X}_{\mathbf{i}})\), will be linearly independent of \(\mathbf {X}_{\mathbf{i}}\) and thus a valid IV for \(e_{i}\). However, this assumption is very reliant on their being enough variation in the tails of the predicted \({\hat{P}}(\mathbf {X}_{\mathbf{i}})\) because the probabilities from \(\Phi (\mathbf {X}_{\mathbf{i}} \varvec{{\hat{\gamma }}})\) will be approximately linear around the center of this distribution.

The Klein and Vella (2009) estimator bypasses the aforementioned functional form assumption relying on variation in the tails. Instead it uses heteroskedasticity to achieve identification, which in econometric terms means that \(Z_i \equiv [\mathbf {X}_{\mathbf{i}} \varvec{\gamma }/\exp (\mathbf {X}_{\mathbf{i}} \varvec{\delta })]\) is linearly independent of \(\mathbf {X}_{\mathbf{i}}\). The nonlinearity generated by the normal CDF \(\Phi (\cdot )\) is no longer a prerequisite for identification.

In practical terms, this methodology depends on three assumptions. Firstly, there must be sufficient heteroskedasticity in Eq. (22). If there is little or no heteroskedasticity \({\hat{P}}(\mathbf {X}_{\mathbf{i}})\) will be a weak IV for \(e_{i}\), causing all the usual problems associated with weak IVs. Fortunately, this is a testable assumption. Secondly, we must assume that all of the variables in \(\mathbf {X}_{\mathbf{i}}\) are exogenous. This is not a testable assumption, however one must also make this assumption when estimating an IV regression model with an exclusion restriction. Thirdly, we must also assume that the degree of endogeneity \(\rho _{vu}\) does not (conditionally) covary with any of the regressors in \(\mathbf {X}_{\mathbf{i}}\). Like the previous assumption, this is not one we can test but one that is already inherent in conventional IV strategies. In our application this assumption appears plausible because our preferred model specification includes a large set of control variables.

The econometric approaches outlined in the above are very appealing in situations where researchers do not have a valid exclusion restriction. Furthermore, these approaches could be used in addition to the conventional IV approach as a robustness check when the instrument’s validity is questionable (in a manner similar to over-identification procedures such as the Sargan test). These estimators may also appeal to applied empirical growth researchers even when the exclusion restriction is valid. Firstly, the instruments could be weak and thus the researchers may face a weak-IV problem. This problem has been acknowledged by Bazzi and Clemens (2013) who found that weak (and invalid) instruments are commonly used in the growth literature. Weak IVs subject the causal effect parameter to a greater standard errors and this uncertainty may make a researcher wary of exclusively using the conventional IV approach. Another well-known issue with the conventional IV approach is that it only identifies the local average treatment effect (LATE) (Heckman and Urzúa 2010). For historical growth studies with heterogeneous treatment effects this means that we might be estimating a local effect that differs substantially from the true parameter of interest (Deaton 2010). In other words, the IV methodology only provides us with an effect estimate that applies to a subsection of the population and if there are heterogeneous effects this means that our LATE differs from the ATE (which is typically the more important parameter of interest). Similarly, research that relies on exogenous variation created by historical events to instrument for contemporary endogenous factors (such as institutions) in growth regressions typically overestimate effect sizes because these instruments ignore persistence channels (Casey and Klemp 2016). The approach we follow allows us to avoid making any exclusion restriction assumptions that lack credibility.

5 Empirical Monte Carlo analysis

Table 2 lists all of the estimators described in Sect. 4. In textbook Monte Carlo analysis researchers simulate a variety of data generating processes (DGPs) and examine how effective certain econometric procedures are under different conditions. The drawback of this approach is that the entire DGP must be specified by the researchers. The Empirical Monte Carlo method, as developed in Huber et al. (2013), does not require researchers to simulate the full DGP for a model. Instead, it uses real data, in our case the census data, to simulate realistic “placebo treatments”, but leaving all of the other data variables unchanged. This approach is particularly useful in our application because we use estimators that rely on distributional assumptions in order to achieve identification.

Table 2 Estimators employed

The Empirical Monte Carlo approach works by simulating placebo treatments amongst the untreated. In our application we focus on the sample of children who were not in school in 1911. In Sect. 6 we discuss the model specification, but for now assume that we are using this paper’s preferred specification of control variables: \(\mathbf {X}_{\mathbf{i}}\). As outlined in Millimet and Tchernis (2013), the Empirical Monte Carlo approach uses the following algorithm:

  1. 1.

    Using the full data sample estimate the probability of being in school \({\hat{P}}(\mathbf {X}_{\mathbf{i}})\) via the heteroskedastic probit model shown in Eq. (22).

  2. 2.

    With the sub-sample consisting of \(e_{i}=0\) designate placebo treatments as \({\tilde{e}}_i\):

    $$\begin{aligned} {\tilde{e}}_{i} = I\left( \frac{\mathbf {X}_{\mathbf{i}} \varvec{{\hat{\gamma }}}}{\exp (\mathbf {X}_{\mathbf{i}} \varvec{{\hat{\delta }}})}+ 0.3 + \zeta _{i} >0 \right) \end{aligned}$$
    (22)

    where \(\zeta _{i} \sim {\mathcal {N}}(0,1)\) is a random draw from a standard normal distribution. No relationship exists between the fertility outcome variable \(n_{i}\) and the placebo treatment \({\tilde{e}}_{i}\), so we expect the estimate of the ATE to be zero: \(\tau _{ATE}=0\). The error term and the outcome are uncorrelated \(E[n_{i}\zeta _{i}]=0\), so the CIA holds.

  3. 3.

    Use the estimators proposed in the Table 2 saving the estimates.

  4. 4.

    Repeat steps (1) to (3).

Given the algorithm above, the best estimator will be the one consistently closest to estimating an ATE of zero. We can also use the Empirical Monte Carlo approach to assess the performance of estimators when the CIA does not hold: \(E[n_{i}\zeta _{i}]\ne 0\). To do this we draw the random error term in step 2 (\(\zeta _{i}\)) from: \(\zeta _{i} \sim {\mathcal {N}}(-0.04{\tilde{n}}_{i},1)\), where \({\tilde{n}}_{i} = (n_{i}-\mu _{n})/\sigma _{n}\), and \(\mu _{n}\) and \(\sigma _n\) are the mean and standard deviation of the fertility variable for the \(e_i=0\) sample. This setup creates an endogeneity bias of around \(-\)0.03, a value similar to OLS estimate of the QQ effect as we will see in Sect. 6.

Table 3 Empirical Monte Carlo results, root mean square errors
Fig. 5
figure 5

Empirical Monte Carlo boxplots

Table 3 details the root mean square errors (RMSEs) for our Empirical Monte Carlo analysis. We performed 250 repetitions of this placebo simulation procedure under the condition that the CIA both holds and fails. Figure 5 provides a visual representation of these results. When the CIA assumption holds, OLS has the lowest RMSE. This is followed by the IPW estimator. We can see this represented in Fig. 5, as the ATE estimates are centered around the null and show relatively little dispersion. The RMSE for the other estimators is larger. Looking at the top panel in Fig. 5 we can see how the MB and KV trade off bias for variance as these estimators are centered around \(\tau _{ATE}=0\), but with a greater amount of dispersion compared to OLS and IPW. The bias corrected and Heckman estimators (that rely exclusively on functional form assumptions) do not perform well when the CIA holds. This is either because the bivariate normality assumption is violated or there is insufficient nonlinearity in \({\hat{P}}(\mathbf {X}_{\mathbf{i}})\). Figure 5 depicts how these estimators are both biased and imprecise. Given the nature of these data, this bias is positive which goes against the child QQ model.

The second row in Table 3 repeats the Empirical Monte Carlo simulation analysis under the condition that the CIA fails. The magnitude of this bias is specified to replicate the OLS estimates we obtain in Sect. 6. In other words, if the true effect was zero but endogeneity created a negative bias of \(\tau _{OLS,ATE} \approx -0.03\). The consequence of the CIA violation is evident in both Table 3 and Fig. 5. When the CIA fails, OLS is no longer the estimator with the lowest RMSE, the KV estimator is now the most accurate. As expected, the OLS and IPW estimators perform worse because they suffer from endogeneity bias. The bottom panel in Fig. 5 shows how both the OLS and IPW estimators are “precisely wrong” in the sense that they consistently estimate an ATE less than \(\tau =0\) with little variation. Table 3 demonstrates how the MB estimators are less biased, so have a lower RMSE, but the bias reduction comes at the expense of higher sampling variance, as seen in Fig. 5. This figure also illustrates how the minimum biased estimators are not unbiased, although a comparison between them and the OLS and IPW estimators is instructive as to the direction of this bias.

In this Empirical Monte Carlo setup the KV estimator is superior to the alternatives when the CIA fails. Table 3 shows how this estimator has the lowest RMSE whilst the bottom panel of Fig. 5 illustrates how, on average, the KV method exhibits considerably less bias than the alternate econometric procedures. As before, the BC and Heckman BVN methodologies have the least desirable properties. Much like when the CIA holds, when the CIA fails these methodologies still produce biased results. The bias associated with these simulations is positive. Given that we are using simulated placebo treatments on our actual data, this finding has implications for our empirical analysis. We expect the QQ effect to be positively biased when measured via Heckman BVN or the bias-corrected methods.

6 Results

The Empirical Monte Carlo results in Sect. 5 indicates the accuracy of the estimators outlined in Sect. 4 for the purpose of our empirical research question. Recall, we would like to estimate the parameter \(\tau \) in the following linear model:

$$\begin{aligned} n_{i} =&\mathbf {X}_{\mathbf{i}} \varvec{\beta } + \tau e_{i} + v_{i} \end{aligned}$$
(23)

where \(\tau \) is the ATE of a family letting a child remain in school on net fertility (number of surviving children as reported by the mother, \(n_i\)). A comparison of the estimated ATE for the full selection of estimation methodologies will be presented later. First, we use some standard OLS regressions with different control variables to estimate \(\tau \). These results are presented in Table 4. This table includes the coefficient estimates alongside their standard errors. The standard errors are (conservatively) clustered at the DED level to account for geographical/spatial autocorrelation.

Table 4 Net fertility regressions, OLS

The first column of Table 4 shows the relationship that connects the variable “Scholar” (whether the child is in school) to the net fertility in their family. In this model we include the following control variables: age dummy variables to capture the difference between 15 and 16 year old children compared to their 14 year old counterparts, a dummy variable to account for gender differences, and parental, both mother and father, age variables. This regression was estimated using 133,811 observations. The coefficient of the Scholar variable is \(-0.011\) indicating that the choice for families to let their children stay in school beyond age 13 results in a 1.1 % drop in fertility.

Column (2) of Table 4 repeats the regression from column (1) adding in a county-level fixed effect. The counties are the 32 administrative districts in Ireland as shown in left-hand panels of Figs. 2 and 3. This is a potentially important control variable because it takes into account a number of geographical factors which may be simultaneously correlated with both school attendance and fertility, as we have previously demonstrated in Fig. 4. Including county fixed effects indicates a larger QQ effect than is shown in the first column. The results for this model imply that the school attendance causes a 3.6 % fall in fertility for this cohort.

A comparison of columns (1) and (2) highlights the importance of geographical controls. The results displayed in columns (3) and (4) show what happens when we include fixed effects that relate to smaller geographical units. In column (3) we include DED fixed effects, where the DED is a smaller geographic unit within counties. In column (4) we include street-level fixed effects, so the QQ effect found in this specification relies on within-street variation, a comparison of neighbors. This is appealing because street-level fixed effects account for all the unobserved heterogeneity, socioeconomic and cultural, that varies at the street level. For example, we would expect there to be less variation in wealth at a street level rather than at the wider county level. Interestingly, the inclusion of either DED or street-level fixed effects does not diminish the size of the QQ coefficient.

Column (5) contains the estimated coefficients and standard errors for a model specification that includes a more comprehensive set of control variables. These additional control variables account for nonlinearity in the parent’s age, religious affiliation (with Roman Catholic as the omitted category), dummy variables for Dublin and Belfast cities (in addition to county fixed effects), latitude and longitude co-ordinates, the number of domestic servants present (a sign of wealth), parental literacy indicators, and parental labor force attributes. Once again, the inclusion of additional geographic and socioeconomic controls has little impact on the QQ coefficient. The coefficient in column (5) implies that school attendance creates a 3.3 % drop in fertility. Finally, in column (6) we repeat the model specification from the previous column but include street-level fixed effects alongside fixed effects that account for the full series of HISCO occupational codes associated with parental employment (van Leeuwen et al. 2002). However, the estimated QQ effect remains unchanged.

Table 4 consistently finds a QQ effect in the region of \(-0.035\). However, as discussed in Sect. 2, there are a number of sources of endogeneity which may render these OLS estimates invalid. In Sect. 4 we motivated the use of alternative models that estimate ATEs in the presence of endogeneity. Table 5 displays these results. Each column of Table 5 contains the average treatment effect estimate and empirical confidence intervals (ECI) obtained via cluster bootstrap for the 9 estimators discussed in Sect. 4 and labeled in Table 2. Note that we include all control variables listed in column (5) of Table 4 in the model design matrix \(\mathbf {X}_{\mathbf{i}}\). We use this comprehensive model specification as Millimet and Tchernis (2013) recommend that the model is over-rather than under-specified. However, our results are robust to the exclusion of multiple variables in \(\mathbf {X}_{\mathbf{i}}\).Footnote 6

Table 5 Effect of school attendance on net fertility, average treatment effects

As we would expect, the OLS estimate of the ATE is the same as is shown in Table 4: \(-0.033\). In our Empirical Monte Carlo analysis the IPW method yielded estimates that were almost identical to the OLS values. This pattern is repeated in Table 4, although IPW is estimated with slightly greater precision as it has narrower ECIs. The Empirical Monte Carlo analysis showed that that the MB estimators tend towards the true ATE as the choice of \(\theta \) is set to smaller values. Therefore, we can interpret the MB-0.25 and MB-0.05 estimates, \(-0.053\) and \(-0.077\) respectively, as indicating the presence of positive endogeneity bias in both OLS and IPW. Hence, the MB estimates suggest that the QQ effect is larger than we have previously estimated. Whilst MB estimators reduce bias at the expense of a higher ATE parameter variance, the 95 % empirical confidence intervals shown in columns (3) and (4) are sufficiently far from the null to show that their ATE achieves statistical significance at conventional p-value levels.

The KV estimator performed best when the CIA fails in our Empirical Monte Carlo analysis. Our previous results indicate that the CIA does not hold in this application, so we expect the KV estimator to be the most informative. The QQ effect measured by the KV model here is large: \(-0.271\), which implies that the choice for families to let their children stay in school beyond age 13 causes in a substantial 27.1 % decrease in fertility.Footnote 7 This finding is consistent with the MB results which suggested that the OLS and IPW estimates are biased upwards. This bias could be due to measurement error or omitted variable bias since we would expect simultaneity to negatively bias QQ estimates.

The remaining columns of Table 5 display the results of the bias corrected (IPW-BC, MB-0.25-BC, and MB-0.05-BC) and Heckman BVN estimators. These estimators rely heavily on the joint normality assumption to achieve identification. Our Empirical Monte Carlo analysis showed how we would expect a violation of this assumption to yield positively biased ATEs. Considering that the Empirical Monte Carlo analysis uses our actual data, it has important implications for our application. The range of BC and BVN estimates, ranging from \(-0.078\) to \(-0.102\), indicate a larger QQ effect. The Empirical Monte Carlo analysis showed that these estimators produced a bias of around 0.1. Applying this bias to the BC and BVN estimates here suggests a QQ effect in the region of \(-0.2\), a value similar in magnitude to that produced by the KV estimator.

Table 6 Effect of school attendance on gross fertility, average treatment effects

The body of evidence presented in Table 5 points towards a large QQ effect. Our analysis proceeds by examining the robustness of this finding. One concern with the results shown in Table 5 is that they use “net fertility” as the outcome variable. This net fertility variable corresponds to the number of surviving infants in the household and one might be concerned with the potential for differential infant mortality to have an offsetting effect. Table 6 addresses this concern by using “gross fertility” as the outcome. This variable represents the number of children to whom the mother has given birth to, regardless of whether they survived or not. Additionally, we also include a variable that counts the number of children who died in \(\mathbf {X}_{\mathbf{i}}\), although the results are almost identical if this additional control is omitted. The results in Table 6 are very similar to those in Table 5.

These census data only capture a cross section of the county’s population at one period in time. Their ability to inform us of life-course demography is somewhat limited. For example, we do not know what the completed family size is for each family. Similarly, there may be an element in survivor bias in the sample of mothers aged over 40. Additionally, the effect of schooling on fertility might be conflated with birth order issues, a concern expressed in Black et al. (2005). To counter these potential issues, we repeat the analysis shown in Table 5 but for a trimmed sample consisting of those with mothers between 35 and 40 years of age and who are also the eldest in the family. This sample will largely consist of first-born children to the age cohort of mother’s less affected by survivor bias. The drawback of this approach is that we are now working with a much smaller sample size of 14,177. The results of this additional analysis are reported in Table 7 below.

Table 7 Effect of school attendance on net fertility, average treatment effects, trimmed sample

The results in Table 7 once again indicate the existence of a substantial QQ effect. The estimated QQ effect here is, for the most part, slightly larger than was estimated in the previous tables. The OLS and IPW ATE estimates are over twice as large as their counterparts in the previous analysis tables. This indicates that there may have been some bias due to sample composition issues and/or a conflation with potential birth order effects. The magnitude of the KV ATE estimate does not change to reflect this, a result that signals this estimator’s robustness. Figure 6 provides a graphical illustration of the model results shown in Tables 56, and 7.

Fig. 6
figure 6

ATEs from Tables 56, and 7. The points in the above represent the ATE estimates and the error bars represents the 95 % empirical confidence intervals obtained using 250 cluster-bootstrap repetitions

Table 8 Effect of school attendance on net fertility, average treatment effects by SES grouping

The body of evidence presented in the above unambiguously supports the presence of a QQ effect in 1911 Ireland. However, this evidence does not allow us to explore how this QQ effect might differ across the socioeconomic spectrum. In the following we examine the potential for heterogeneity in the QQ effect by re-running our analysis on three sub-samples stratified by socioeconomic status (SES) categories. We use the father’s occupation to split these data. Fernihough et al. (2015) matched the 1911 Irish census occupation to HISCO codes and then to the HISCAM index of occupational social association (Lambert et al. 2013). The HISCAM index is a continuous variable that ranges from 28 (the lowest socioeconomic position) to 99. We first split these data by considering the Low SES group to be those who have a father who has an occupation below 58 on the HISCAM scale. This subsample consisted of 65,845 children with fathers typically employed as laborers, agricultural laborers, animal-drawn vehicle drivers, and other professions typically involving unskilled manual labor. Early twentieth century Ireland was primarily an agricultural society and this is reflected in the composition of the children’s fathers occupations. A total of 57,035 of the observation’s fathers were general farmers (although these data do not contain information on the size and quality of the family’s farm). This broad category is designated with a HISCAM score of 58 and are categorized as our “Mid SES,” group. In situations where the father’s HISCAM score is above 58 we designate the observations as being “High SES.” The High SES group is composed of 9,784 individuals living in households where the father worked in roles such as teaching or clerical administration or were a proprietor of their own business.

The results from our analysis run on the aforementioned sub-samples is displayed in Table 8. Overall, these results are somewhat mixed. There appears to be a substantial QQ effect amongst the Low SES grouping as the majority of these models estimate an effect ranging approximately from \(-4\) to \(-6\) %. An exception is the KV estimate, which estimates a QQ effect of \(-1.1~\%\) albeit with wide ECIs. However, it should be noted that the identifying power generated by heteroskedasticity in the first stage is weaker here in comparison to other models.Footnote 8 The QQ effect appears to be muted amongst the sons and daughters of general farmers, the Mid SES grouping. The results are inconclusive. The conventional OLS and IPW estimators suggest a small positive QQ effect whereas the MB estimators indicate that the OLS and IPW estimates are biased upwards, and the bias-corrected estimators all find a negative QQ effect. The results for the High SES group are unambiguous as a strong negative relationship is found in this subsample by all of the estimators.

Table 8 does not indicate the presence of a socioeconomic gradient in QQ effects. Instead, it provides evidence suggesting a weak QQ effect amongst families who operated as general farmers whilst a much stronger effect outside this large cohort. A speculative interpretation of Table 8 is that land-owning farming families faced different price parameters with respect to fertility and schooling because the children of these farmers could be employed in seasonal work that did not interfere with school attendance whilst families outside this social class perhaps found such opportunities, especially in urban areas, less feasible. The estimators in this paper trade off variance and bias, so once we start reducing the sample size, as in the Table 8 subsamples, the causal effect parameter estimates exhibit greater uncertainty. A consequence of this uncertainty is the wider confidence intervals, as displayed in Table 8 and these results must be treated with a degree of caution.

7 Conclusions

The emergence of unified growth theory has brought an imperative to understanding the historical relationship linking fertility and human capital. However, there is an absence of research examining whether the child QQ trade-off existed during the demographic transition.

To evaluate the child QQ trade-off in a historical society we used the complete census records from Ireland in 1911. If QQ theory holds, we expect those children who remain in school past the compulsory leaving age to be from families with lower fertility. This study highlights the importance of using disaggregated individual level data. There is potential for researchers to commit ecological fallacy when using marco/aggregated data to infer individual-level behavioral relationships. Our analysis finds a positive relationship connecting fertility and education at an aggregated level, however this relationship is reversed (as implied by the QQ model) once we focus on disaggregated units.

At the individual level we find a negative relationship between school attendance and literacy. The strength of this relationship increases as control variables are included in the model specification, suggesting the presence of a positive endogeneity bias. To formally account for this bias we use a series of estimators designed to estimate causal effects without exclusion restrictions. An Empirical Monte Carlo exercise that simulates placebo treatments reveals one estimator, a parametric version of the Klein and Vella (2009) IV procedure, performs very well in our application. The only additional assumption this IV methodology makes is the presence of heteroskedasticity in the first-stage equation, as opposed to an exclusion restriction—an assumption that is comfortably satisfied here.

Our preferred model specification, that of Klein and Vella (2009), estimates a child QQ effect of \(-0.266\). In our context, this means that the families chose to dedicate resources to their child’s education by reducing their fertility by about 27 %. This substantial effect supports the use of the child QQ model to aid our understanding of simultaneous decline in fertility, increase in human capital, and subsequent economic growth of Western economies in the late-nineteenth and early-twentieth century.