1 Introduction

The ageing of the population, the consequences of the Great Recession, and more recently the COVID-19 pandemic put the sustainability of public finances at risk. Governments are looking for cost-effective ways to increase labour supply. To this end, policymakers need good empirical information on how financial incentives affect the labour supply of different groups and the associated changes in public expenditures and revenues. Policymakers receive information from two different sources when considering the effectiveness of tax-benefit reforms. On the one hand, quasi-experimental studies report on the effects of specific reforms (‘natural experiments’) implemented in the past. Advantages of this approach are that it relies on minimal assumptions and has a transparent source of identification (Angrist and Pischke 2009; Heckman 2010). However, disadvantages include the absence of an underlying economic model, the external validity of the treatment effects, and the limited scope for doing counterfactual policy analysis.Footnote 1 On the other hand, policymakers get simulation results from analyses with structural models. The strong points of this approach are the use of an explicit economic model, the possibility to predict treatment effects in external environments, and the wide scope for conducting counterfactual policy analysis (Keane 2010; Heckman 2010). However, structural models rely on more assumptions and the source of identification is not always transparent. Looking at the strengths and weaknesses of both approaches, they can be said to complement each other.

In this study, we combine structural models with quasi-experimental studies to evaluate the effectiveness of tax-benefit reforms in the Netherlands. Specifically, we estimate structural models for labour supply of a large number of subgroups, compare the simulated treatment effects of policy reforms in the structural models with the results from three quasi-experimental studies, and then use the structural model to study counterfactual policy reforms that feature prominently in the political debate in the Netherlands and abroad.

We estimate a structural discrete-choice model for labour supply, building on a large body of literature, e.g. Aaberge et al. (1995, 1999), Van Soest (1995), Keane and Moffitt (1998), Brewer et al. (2006), and Bargain et al. (2014). Discrete-choice models have the advantage of being able to take into account all the complexities in the budget set that result from the tax-benefit system, such as kinks and non-convexities. Furthermore, the estimation of discrete-choice models does not require ex ante imposition of quasi-concavity of preferences.Footnote 2 We use an exceptionally large and rich administrative household data set that was constructed specifically for this project by Statistics Netherlands: the Labour Market Panel (Arbeidsmarktpanel). Our sample consists of more than 840,000 observations. The size of this data set allows us to precisely estimate preferences over income, leisure, and formal childcare (for parents with young children) for a large number of subgroups.

Next, we use the estimated structural model to simulate a number of key reforms implemented in the past and compare the simulated treatment effects with quasi-experimental studies on the same reforms. In particular, we compare the simulated treatment effects of the 2005–2009 reform of childcare subsidies and in-work benefits for households with young children with the estimated treatment effects presented in Bettendorf et al. (2015). Furthermore, we compare the simulated treatment effects of the 2002 reform of the in-work benefit for single parents with the estimated treatment effects presented in Bettendorf et al. (2014). Finally, we compare the simulated intensive margin (hours worked per employed person) elasticities of the structural model with the estimated intensive margin elasticities presented in Bosch and van der Klaauw (2012) and Bosch and Jongen (2013), who use the 2001 tax reform that substantially reduced marginal tax rates. Part of the variation used in the quasi-experimental studies is also used in the estimation of the structural model, but part of the variation also comes from outside the data period used to estimate the structural model.Footnote 3

Our main findings are the following. First, we uncover large differences in labour supply elasticities between demographic groups and decision margins. Indeed, there are large differences in labour supply elasticities by age of the youngest child. We also find large differences in the relative importance of the extensive margin (participation) and the intensive margin (hours per employed person), with the intensive margin response still substantial for women with young children but small for most other groups. Second, we find that the structural model gives a good prediction of the treatment effects estimated in quasi-experimental studies on past reforms. Therefore, we feel confident to simulate counterfactual tax-benefit reforms using the structural model. Third, we find that reducing marginal tax rates is not an effective way to promote labour supply. In-work benefits targeted at low-wage earners appear to be more effective, and policies targeted at working mothers with young children generate the largest labour supply response and also cause a noticeable drop in gender inequality in wage income. However, policies targeted at working mothers with young children generate little additional revenue for the government because that group already receives substantial subsidies when working. Using the structural model we also simulate some large counterfactual tax reforms that feature prominently in the political debate. We find that proposals for a move towards a flat tax system, a basic income, or joint taxation are not effective, including a steep rise in gender inequality. Indeed, an efficient tax system accounts for the large heterogeneity in behavioural responses we uncover and hence cannot be too simple.

This study contributes to the existing literature in a number of ways. The size of our data set allows us to estimate preferences separately for subgroups that earlier studies did not consider (previous studies typically only focussed on couples, e.g., Van Soest and Das 2001; Van Soest et al. 2002) or had to pool due to the limited number of observations (previous studies typically pooled households with and without children, e.g., Bargain et al. 2014). We consider a large number of household types, do not pool the data over the various household types, and uncover much more heterogeneity in the behavioural responses than previous studies. We also exploit detailed administrative data on the use and price of formal childcare, information that is not readily available in most other labour supply studies (e.g. Blundell et al. 2000; Van Soest and Das 2001; Blundell and Shephard 2012; Bargain et al. 2014). Furthermore, previous studies had to rely on one cross-section of data or on a few repeated cross-sections from a period when there was hardly any change in the tax system (e.g. Van Soest and Das 2001; Van Soest et al. 2002; Bargain et al. 2014). Hence, identification in those studies comes only from cross-sectional differences in financial incentives due to non-linearities in the tax-benefit system. In contrast, we use several years of data, which includes a major reform of childcare subsidies and in-work benefits for working parents. These reforms strengthen our identification by generating large exogenous variation in budget constraints. Also, we compare the predictions of the structural model with results from a number of quasi-experimental studies on key past reforms and thus contribute to the growing body of literature that evaluates the performance of structural models by comparing simulated policy responses with the results from (quasi-)experimental studies (e.g. Attanasio et al. 2011; Bargain and Doorley 2017; Brewer et al. 2006; Cai et al. 2008; Todd and Wolpin 2006; Geyer et al. 2015; Hansen and Liu 2015; Pronzato 2015; Thoresen and Vatto 2015). In addition, because our structural model is fully integrated with a detailed tax-benefit calculator, we are able to study the effectiveness of various reform proposals, taking into account the budgetary effects of the behavioural responses to the reform. Indeed, the integrated model allows us to go beyond back-of-the-envelope calculations on the effectiveness of different reform proposals (as in, e.g. Blau 2003; Lokshin 2004).

The outline of the paper is as follows. Section 2 describes the Dutch labour market and policy environment. Section 3 outlines the structural model and empirical strategy. Section 4 describes the data set used to estimate the structural model and Section 5 presents the estimation results and corresponding labour supply elasticities. Section 6 then compares the simulated treatment effects of the structural model on a number of key past reforms with the estimated treatment effects from quasi-experimental studies. Next, Section 7 studies the effectiveness of a number of counterfactual tax-benefit reforms. Section 8 discusses our findings and concludes. Supplementary material is given in an Appendix.

2 The Dutch labour market and tax-benefit system

Over the past decades, the Netherlands, like many other developed countries, witnessed a substantial shift in the household composition of the population, see Fig. 1. In particular, the share of couples with children has declined, whereas the share of couples without children has increased. Furthermore, the share of singles, both with and without children, has increased. Hence, empirical knowledge of the behavioural responses of singles and single parents is becoming increasingly relevant.Footnote 4

Fig. 1
figure 1

Shares of individuals per household type in the Netherlands. Source: Statistics Netherlands (statline.cbs.nl). Individuals 15–64 years of age

Another important development has been the rise in the employment rate of women. Figure 2 shows the employment rate for men and women in the Netherlands over time, along with the development in the employment rate for a number of other countries. The employment rate for men in the Netherlands has increased somewhat over the 1975–2014 period, but with a noticeable dip in the 1980s due to the recession and the increased use of early retirement and disability benefits. Since the 1980s, early retirement benefits have been cut down substantially and have become more actuarially fair, and access to disability schemes has become more difficult. As a result, the employment rate of men has rebounded to pre-1980 levels. In 2014, the employment rate of men in the Netherlands was one of the highest of all OECD countries. The rise in the employment rate of women has been nothing short of spectacular, from just 30% in 1975 to 70% at the start of the Great Recession. Indeed, by 2014, the Netherlands had one of the highest employment rates for women in the OECD. An age-period-cohort analysis by Euwals et al. (2011) indicates that cohort effects are an important driver behind this increase, in particular for women with young children. The steep rise in the employment rate of women has important implications for the behavioural responses to changes in financial incentives. We estimate much lower labour supply elasticities for women in couples than previous studies for the Netherlands (e.g. Van Soest 1995; Van Soest and Das 2001) that used data of the 20th century.Footnote 5

Fig. 2
figure 2

Employment rates in selected countries. Source: OECD (2016). Individuals 15–64 years of age

Whereas the rise in the employment rate of women in the Netherlands has been spectacular, their hours worked remained remarkably stable over the past decades, see Fig. 3. If anything, hours worked per week by employed women decreased slightly. However, a noticeable gap of 5 to 10 h per week remains with their peers in other European countries. The changes in hours worked per week by employed men were also limited over the past decades, with a slightly downward trend. However, also for men, the Dutch, on average, work a few hours per week less than their peers in other European countries. Below, we study how changes in tax-benefit policies affect both the participation and the hours-worked-per-week decisions.

Fig. 3
figure 3

Average usual weekly hours worked on the main job in selected countries. Source: OECD (2016). Individuals 15–64 years of age. a Men. b Women

Considering the Dutch tax-benefit system, the Netherlands has an individual tax system, with specific tax credits and subsidies targeted at certain groups. However, income-dependent income support is based on household rather than individual income. The financial incentives implicit in the tax-benefit system are illustrated in the appendix to De Boer and Jongen (2020), where we present so-called effective marginal tax rates (EMTRs) and participation tax rates (PTRs) for different subgroups.Footnote 6 The analysis shows that EMTRs and PTRs are particularly high for singles and single parents, and particularly low for secondary earners. Below, we consider how changes in the EMTRs and PTRs following tax-benefit reforms affect the participation and hours-worked-per-week decisions of different groups on the labour market.

3 Structural model

We develop a structural model, where households are assumed to maximise a unitary household utility function. The most elaborate specification, for couples with young children, is outlined below. In this household, both partners choose their labour supply as well as their hours of formal childcare.Footnote 7,Footnote 8 The utility functions for the other household types (defined below) are a special case of this utility function.

The systematic part of household utility, Us, depends on disposable income y, hours worked by the male hm, hours worked by the female hf, and hours of formal childcare cc. We ignore saving and borrowing, and hence consumption equals disposable income. For the functional form of Us, we use the translog specification:

$$ \begin{array}{@{}rcl@{}} U^{s}(\nu)&=&\nu^{\prime}\textbf{A}\nu + \textbf{b}^{\prime}\nu + \textbf{d}^{\prime}\boldsymbol{1}[\mu>0], \\ \nu&=&(\log(y), \log(1-h_{m}/T), \log(1-h_{f}/T), \log(cc)), \\ \mu&=&(h_{m}, h_{f}, cc), \end{array} $$
(1)

with A being a symmetric matrix of quadratic coefficients and b being a vector of linear coefficients corresponding to the vector of the variables contained in ν. Note that we allow childcare to enter directly in the utility function (next to entering via disposable income). The hours worked variables hm and hf in the vector ν have been transformed into indicators of leisure utilisation, representing the fraction of weekly time endowment T which is spent on activities unrelated to work. The vector d captures fixed costs of work, for the male and the female separately, and fixed costs of using formal childcare, for the household as a whole.

For some household types, the full translog specification resulted in a significant share (> 5%) of households with negative marginal utility of income in the observed choices. This is not consistent with utility maximisation and drives down the labour supply elasticities to implausible values. For these household types, we dropped the interaction terms between income and leisure, which resulted in a low share of households with negative marginal utility of income (< 5%). For some households, we also obtained an ‘inverted’ pattern for the marginal utility of income, with a negative (log) linear term and a positive (log) quadratic term. This results in implausible (positive) income effects, and for these households, we dropped the quadratic term in income. Finally, for certain other household types, the translog specification was still not flexible enough. In particular, in some cases, we do not capture the distribution of hours worked at the top very well, and we introduce a third-order term for (log) leisure, which then improves the fit at the top.Footnote 9

We allow for preference variation through observed individual and household characteristics x2, x3, and x4 in parameters b2, b3, and b4:

$$ \begin{array}{@{}rcl@{}} &&{\mathbf{b}}=(b_{1},b_{2},b_{3},b_{4}), \\ &&{b_{1}}=\beta_{1},\quad {b_{2}} = {{{\mathbf{x}^{\prime}}}_{2}}{{\beta}_{2}+\psi_{2}},\quad {b_{3}} = \mathbf{x}^{\mathbf{\prime}}_{3} {{\beta}_{3}+\psi_{3}},\quad {b_{4}} = {\mathbf{x}^{\prime}}_{4} {\beta}_{4}+\psi_{4}, \end{array} $$
(2)

which are the linear utility terms in leisure of the male, leisure of the female, and hours of formal childcare, respectively. The same variation is also allowed for the fixed costs parameters d. We further allow for unobserved preference heterogeneity in the preference parameters for leisure (ψ2 and ψ3, for the male and female, respectively) and formal childcare (ψ4).Footnote 10 We do not allow for observed and unobserved preference heterogeneity in the coefficient b1 of income, because it is hard to identify this preference heterogeneity separate from the preference heterogeneity in leisure and childcare.Footnote 11

Disposable household income is given by:

$$ y=w_{m} h_{m} + w_{f} h_{f} - T(w_{m},h_{m},w_{f},h_{f};q)-TC(p_{cc},cc;q)+S(p_{cc},cc,y_{t};q), $$
(3)

where wm and wf denote the gross hourly wage for the male and the female,Footnote 12T(.) denotes taxes and employees’ premiums, q denotes individual and household characteristics, TC(.) is the total cost of formal childcare, with pcc denoting its price per hour, and S(.) is the childcare subsidy, which depends on the hourly price of formal childcare, the hours of formal childcare, taxable income yt, and household characteristics (like the ages of the childrenFootnote 13).

For workers, we observe gross hourly wages which are used to compute the work-related part of income for each alternative in the choice set.Footnote 14 For non-workers, we simulate wages using estimates from a model that accounts for selection (Heckman 1979)Footnote 15, and taking multiple draws from the estimated wage error distribution, see the appendix to De Boer and Jongen (2020). Similarly, for households that use formal childcare, we use the observed hourly prices of formal childcare, and for non-users, we simulate hourly prices using estimates from a model that accounts for selection and taking multiple draws from the estimated gross hourly price error distribution, see the appendix to De Boer and Jongen (2020).

For our empirical specification, we use a discrete-choice model. Households choose their preferred combination of hours of work and hours of formal childcare from a finite set of alternatives \(j \in \left \{ {1,...,J} \right \}\). Next to the systematic part Us(νj), the utility function contains alternative-specific stochastic terms εj:

$$ U(\nu_{j}) = U^{s}(\nu_{j}) + \varepsilon_{j}. $$
(4)

These stochastic terms are assumed to be independent and identically distributed across alternatives, and to be drawn from a type 1 extreme value distribution. This leads to a multinomial logit specification (McFadden 1978).

Random preference heterogeneity, along with the draws from the estimated wage and price equations for non-workers and non-users of formal childcare, respectively, complicates the estimation of the likelihood. We use R = 50 (independent) draws from the wage distribution for non-working men and women, the price distribution for non-users of formal childcare, and the random terms for unobserved heterogeneity. We use simulated maximum likelihood, where the likelihood is given by:

$$ L=\prod\limits_{i=1}^{N} {\frac{1}{R}\sum\limits_{r=1}^{R} {\left( \frac{\exp ({U_{k}^{i}}(w_{i,m,r},w_{i,f,r},p_{cc,r},\psi_{i,2,r},\psi_{i,3,r},\psi_{i,4,r}))}{\sum\limits_{j=1}^{J} {\exp (U_{j }^{i}(w_{i,m,r},w_{i,f,r},p_{i,cc,r},\psi_{i,2,r},\psi_{i,3,r},\psi_{i,4,r}))} } \right)} }^{D_{ki}}, $$
(5)

with Dki being an indicator function taking the value 1 for the observed choice for household i, and zero otherwise.Footnote 16

4 Data structural model

We use data from the Labour Market Panel (LMP) of Statistics Netherlands (2012). This data set was constructed specifically for the empirical analysis presented here. The LMP is a large administrative household panel data set over the period 1999–2009, containing a rich set of individual and household characteristics, including gender, month and year of birth, the level of education and ethnicity of all adult household members, the ages of the children, and place of residence. In addition, the LMP also contains administrative data on gross income from different sources (e.g. wages, profits, benefits) and on hours worked. Finally, the LMP contains administrative data on the use and gross hourly price of formal childcare for each child in formal childcare.Footnote 17 Because data on childcare is only available from 2006 onwards, we restrict the sample to the 2006–2009 period.

In the empirical analysis, we model the labour supply decision for employed people, those on welfare benefits, and those without personal income.Footnote 18 We make a number of additional selections. We exclude people under 18 years of age (most of them are in education), and those over 63 years of age (we do not model the retirement decision). Furthermore, we do not model the labour supply decision of students, people on retirement or disability benefits, and the self-employed. We do not model their labour supply decision because reliable information is not available on their hours worked or because we are unable to determine their budget constraint. Furthermore, same-sex households are also excluded, as are households for which characteristics about individual members or the household are missing.Footnote 19 In the end, we use 840,348 observations in the estimations.

We estimate structural discrete-choice models for the following 15 household types: childless singles; single parents with a youngest child aged 0–3, 4–11, 12–17, or 18 years of age or older, respectively; adult children living with their parent(s);Footnote 20 couples without children where we model the labour supply decision of both partners; couples without children where we only model the labour supply decision of the man (because the woman is a student, on disability or retirement benefits or self-employed, see above); couples without children where we only model the labour supply of the woman (because the man is a student, on disability or retirement benefits or self-employed); couples where we model the labour supply of both partners and that have a youngest child aged 0–3, 4–11, 12–17, or 18 years of age or older, respectively; couples with children where we only model the labour supply of the man (because the woman is a student, on disability or retirement benefits or self-employed); and couples with children where we only model the labour supply of the woman (because the man is a student, on disability or retirement benefits or self-employed).

We discretise the data for the discrete-choice model in the following way. Adults can choose from 6 labour supply options: working 0, 1, 2, 3, 4, or 5 days per week, each day equalling 8 h.Footnote 21 For childcare, we allow for 0, 1, 2, and 3 daysFootnote 22, with data showing a typical childcare day to equal 10 h,Footnote 23 and a typical out-of-school-care day to equal 5 h.Footnote 24 Couples with a youngest child aged 0 to 3 or 4 to 11 have the largest choice set: 6 ⋅ 6 ⋅ 4 = 144 alternatives.

To determine disposable household income in each discrete option, we use the advanced tax-benefit calculator MIMOSI (Koot et al. 2016).Footnote 25 MIMOSI is the official tax-benefit calculator of the Dutch Government for the (non-behavioural) analysis of the impact of reform proposals on the disposable income distribution and the government budget. MIMOSI takes into account all (nationalFootnote 26) taxes, social security premiums, and income-independent subsidies and tax credits. Furthermore, MIMOSI also calculates the childcare subsidy in each option.Footnote 27 Furthermore, in accordance with the law, we ensure that household disposable income (excluding childcare costs and childcare subsidies) cannot drop below the welfare level.Footnote 28 For each discrete option, we also calculate the net transfer from the household to the government (positive or negative). This allows for an accurate calculation of the net budgetary costs of the reforms we simulate.

5 Estimation results structural model

For each household type, we allow preferences for leisure to depend on age and fixed costs of work to depend on the level of education (in three classes) and ethnicity (in three classes). Furthermore, for households with a youngest child aged 0–3 or 4–11, we allow preferences for the use of formal childcare to depend on the level of education and ethnicity, and whether or not their residential location was in a large city (> 150,000 inhabitants). The preference parameters are not discussed here, because it is often a combination of preference parameters that drives behavioural responses (the estimated preferences are available on request).Footnote 29 Furthermore, there is no analytical solution for the labour supply elasticity in discrete-choice models. Therefore, following the literature (e.g. Bargain et al., 2014), we simulate these elasticities by increasing gross wages by 10%. We present the total elasticity (% change in total hours worked over the % change in the gross wage rate), and the decomposition into the extensive margin elasticity (% change in the participation rate over the % change in the gross wage rate) and the intensive margin elasticity (% change in hours worked per employed person over the % change in the gross wage rate).

Figure 4 gives the simulated labour supply elasticities for couples where both partners can adjust their labour supply. We find small, positive labour suppy elasticities for men. Labour supply elasticities are higher for women, on the extensive margin but also on the intensive margin. Furthermore, labour supply elasticities for women in couples are particularly high when they have a young child.Footnote 30

Fig. 4
figure 4

Couples where both partners have a labour supply choice. a Men. b Women

Figure 5 shows that the labour supply elasticity is relatively low for childless singles without children.Footnote 31 The labour supply elasticity is much higher for single parents with a youngest child of 0–3 years of age. The labour supply elasticity of single parents whose youngest child is over the age of 3 is lower, though still higher than that of childless singles. Also note that the differences among single parents are primarily driven by differences in the extensive margin elasticity. The intensive margin response for single parents is quite small.Footnote 32

Fig. 5
figure 5

Households where only one person has a labour supply choice, and adult children. a Singles and single parents. b Couples where only one person has a labour supply choice, and adult children living at home

Figure 5 also gives the labour supply elasticities for couples where only one partner can choose his or her labour supply (because the other partner is a student, on disability or retriement benefits or self-employed). For these groups, we pool couples with children of all ages. Most men in these couples are working, and typically also full-time (descriptive statistics are available on request). Hence, there is little upward potential in terms of total hours worked, and they have a relatively low labour supply elasticity. For women in these couples, there is more upward potential in total hours worked, they have a higher labour supply elasticity. Finally, adult children living with their parents generally have a high participation rate, resulting in a relatively low labour supply elasticity.Footnote 33

6 Comparison predictions structural model with results from quasi-experimental studies

Next, we consider whether the structural model can predict the effects of past reforms. There is a growing body of literature that compares the simulated policy responses in structural models with the results from (quasi-)experimental studies.Footnote 34 We present three such comparisons: (i) one for couples with a youngest child of 0–3 and 4–11 years of age, (ii) one for single parents with a youngest child of 12–15 years of age, and (iii) one for the intensive margin responses for a number of household types. For the comparison, we have re-estimated the structural model using data from 2006 only, so that the parameter estimates are only estimated on data in the absence of the policy change of the first reform we consider.Footnote 35

Table 1 shows our structural model results for couples with a youngest child of 0–3 and 4–11 years of age. Bettendorf et al. (2015) use differences-in-differences (DD) to analyse the employment effects of a combination of reforms during the period 2005–2009 targeted at households with children of 0–11 years of age. These reforms contained three elements: (1) an increase in childcare subsidies (column Childcare subsidies), (2) an increase in the in-work benefit for secondary earners with children of 0–11 years of age (column Income-depend.combi.credit), and (3) the in-work benefit for both primary and secondary earners with children of 0–11 years of age was abolished (column Combi.credit).Footnote 36Bettendorf et al. (2015) use data for the 1995–2009 period from the Labour Force Survey. They present estimation results for women in couples with a youngest child of 0–11 years of age. To make the comparison with the DD as clean as possible, we use the same sample as Bettendorf et al. (2015) to estimate the treatment effects by the subgroups with a youngest child of 0–3 or 4–11 years of age, respectively. The results are given in Table 1, along with the simulation results from the structural model. The results for the structural model are consistent with the results from the DD analysis for women with children. The estimated effects on the participation rate of men with children are also in line with the predictions from the structural model. The only coefficient of the DD analysis which differs somewhat from the prediction of the structural model is the intensive margin response by men, for which the DD analysis suggests a larger, negative though not statistically significant response than the structural model.

Table 1 Comparison with results quasi-experimental study: couples with young children

Table 2 shows the evaluation of the predictions made by our structural model for single parents with a youngest child of 12–15 years of age. Bettendorf et al. (2014) use DD and regression discontinuity (RD) to study the impact of an in-work benefit targeted at single parents. In 2002, the children’s age of eligibility was increased, and the target group of the in-work benefit was extended from single parents with a youngest child of 0–11 years of age to single parents with a youngest child of 0–15 years of age. The outcome of the analysis is that they find a small effect on labour supply, in both the DD and RD analyses, not significantly different from zero. Table 2 shows the effect of abolishing the in-work benefit targeted at single parents. We simulate the effect on the participation rate of single parents with a youngest child of 0–3, 4–11, and 12–15 years of age.Footnote 37 We find the effect on the participation rate with –1.1 percentage points to be quite sizeable for single parents with a youngest child of 0–3 years of age, whereas it drops to –0.6 percentage points for single parents with a youngest child of 4–11 years of age, and to 0.0 percentage points for single parents with a youngest child of 12–15 years of age. Indeed, single parents with a youngest child 12–15 years of age are already less responsive to financial incentives than single parents with a youngest child 0–3 years of age. More importantly, again, the structural model predicts treatment effects that are in line with the results from the quasi-experimental study.

Table 2 Comparison with results quasi-experimental study: single parents

Finally, we compare the intensive-margin responses in the structural model with a quasi-experimental study on intensive-margin responses. Figures 4 and 5 show that intensive-margin responses are typically rather small, and much smaller than extensive-margin responses. We compare the intensive-margin elasticities with results from the DD analysis in Bosch and Jongen (2013). They use the 2001 tax reform, which generated large heterogeneous variation in marginal tax rates. For men in couples, they find very low intensive-margin elasticities with a point estimate of 0.00 (s.e. 0.01), in line with the structural model. For women in couples, they find larger intensive-margin elasticities, with a point estimate of 0.15 (s.e. 0.06).Footnote 38 This is in line with the structural-model results on the response at the intensive margin for women with young children, and somewhat higher than for the other groups of women in couples. For singles and single parents, Bosch and Jongen (2013) also find somewhat higher intensive-margin elasticities than the structural model, 0.15 to 0.20, respectively. We should note though that the comparison is somewhat complicated because we compare gross wage elasticities of the structural modelFootnote 39 to the net wage elasticities of the DD. The latter are typically higher (Bargain et al. 2014).

7 Simulating tax-benefit reforms

Overall, the structural model predicts the treatment effects of past reforms rather well. We then exploit the strength of the structural model by simulating counterfactual tax-benefit reforms. We first consider changes in a selected set of policy parameters, motivated by recent reforms in the Netherlands, to illustrate which policies are more or less cost-effective in terms of stimulating labour supply. Subsequently, we consider a number of major tax reforms that feature prominently in the policy debate in the Netherlands, for example in the election proposals of Dutch political parties, and abroad. Also for the policy simulations, we use the estimated preferences using data from 2006 only.Footnote 40,Footnote 41

7.1 Changes in selected tax-benefit policies

Marginal tax rates

We first consider the effects of changes in marginal tax rates. More specifically, we consider the effects of decreasing the tax rate of the first, second, third, and fourth (open) tax bracket, so that, in each simulation, tax revenues decrease by 1.5 billion euros before behavioural responses. Table 3 gives the results in columns (1)–(4), respectively.Footnote 42

Table 3 Changes in selected tax-benefit policies

We report the effects on hours worked per week and on the participation rate. Hours worked per week includes the zeros for the non-employed. The participation rate is the number of persons employed over the total number of employed and not employed.Footnote 43 We also calculate the effect on labour productivity per hour worked, which is obtained by subtracting the change in hours worked from the change in labour costs, where the change in labour costs, in turn, is an approximation for the change in output. Furthermore, we also calculate the so-called knock-on effects for the government budget. These are the net budgetary savings due to behavioural responses, expressed as a percentage of the ex-ante (before behavioural responses) budgetary ‘shock’.Footnote 44 This is particularly relevant for simulations that increase the participation of (potential) secondary earners, who pay little in taxes and typically get substantial subsidies when they start working. An alternative strategy would be to simulate policy reforms that are budgetary neutral after taking into account behavioural responses. However, this is rather time consuming and does not change the relative effectiveness of the different policies. Finally, we report the effect on household income inequality (before behavioural changes), using the Gini coefficient.Footnote 45

Column (1) gives the results for the decrease in the tax rate in the first tax bracket. Overall, hours worked and the participation rate hardly change. However, this is the net result of some groups that decrease and some groups that increase their labour supply. In particular, there is a modest decrease in hours worked by men in couples, due to an income effect (for them, the first tax bracket is typically inframarginal), and a modest increase in hours worked by women in couples, for them the substitution effect dominates the income effect. Income inequality, as measured by the Gini coefficient, decreases.

Column (2) gives the effect of lowering the tax rate in the second tax bracket. The effect on overall labour supply is positive but modest. Men in couples now also work a bit more due to the substitution effect, while the effect on women in couples is larger than under reform (1). When comparing the effects on total hours worked per week with those on the participation rate, most of the response comes from the intensive margin rather than the extensive margin. Income inequality increases somewhat, as the lowest incomes do not benefit from a lower second tax bracket rate.

Column (3) then considers the effects of a decrease in the third tax bracket rate. The increase in overall labour supply in hours is somewhat smaller than in reform (2), because of the smaller effect on hours worked by women. Indeed, although for some of these women the third tax bracket is the relevant marginal tax bracket, their own income effect and the income effect from a higher income of their partner dominates.Footnote 46 For single parents and singles, we find a positive effect on labour supply in hours, they do not have an income effect from a partner and the substitution effect of the lower marginal tax rate dominates. Income inequality increases more than under reform (2).

Finally, column (4) gives the effects of lowering the tax rate in the fourth (open) tax bracket. This has only a small positive effect on overall hours worked, and the effect on labour supply in persons is negative (due to the ‘added worker effect’). But, where the increase in hours worked is much smaller under reform (4) than reform (3), labour productivity increases more due to a composition effect, workers with income in the fourth tax bracket are more productive. Also, because high-income individuals pay a relatively large amount of taxes, the knock-on effect for the government budget is higher than for reforms (1)–(3). Lowering the top rate leads to the biggest increase in inequality of reforms (1)–(4).

Participation tax rates

Next, we consider policy reforms targeted more at the ‘participation tax rate’, the effective tax on the transition from non-employment to employment. Specifically, we consider lowering the participation tax rate through a ‘carrot’ or a ‘stick’.

Column (5) gives the simulated effects of a reduction in welfare benefits by 14% (the stick), for a total amount of 500 million euros.Footnote 47 This leads to a substantial increase in overall labour supply, both in total hours worked and in persons, of + 0.7% and + 0.6% respectively. The effects are much larger than the reforms considered before, because welfare benefits operate on the extensive margin. The response is particularly large for single parents; 32% of single parents are on welfare benefits in the base.Footnote 48 The knock-on effects for the government are very high, because there is a sizeable reduction in the expenditures on welfare benefits due to behavioural responses. On the downside, this simulation causes a steep rise in income inequality.

In column (6) we use the ‘carrot’ instead, and consider an increase in the (general) in-work tax credit for a total amount of 1.5 billion euros, targeting the increase at low-income workers.Footnote 49 This also has a larger effect on total hours worked than reducing tax bracket rates because it is targeted at the extensive margin. Also on the upside, income inequality decreases, as the reform targets low income workers. Indeed, in this simulation, there is an increase in hours worked as well as a decrease in income inequality (see also Saez, 2002 for the potential welfare gains from in-work tax credits for low-income workers).Footnote 50 Furthermore, as most of the increase in participation is by women, this reform also causes a noticeable reduction in gender inequality. Indeed, Table 5 in the Appendix gives the change in average hours worked by men minus the average hours worked by women in couples for all the reforms in Table 3.Footnote 51 We find that the inequality in hours drops, because women are more responsive to the change in the in-work tax credit. However, on the downside, there is a sizable drop in average productivity, due to a change in the workforce composition, and the knock-on effect is also close to zero.

Subsidies for households with young children

Finally, we consider a number of reforms targeted at households with young children.Footnote 52 This group is of particular interest because there are many policies targeted at this group, and because mothers with young children appear particularly responsive to changes in financial incentives. In these simulations, we use a smaller increase in government expenditures than before, because these reforms target only a subgroup of the working age population.

In column (7), we increase the income-dependent part of the in-work tax credit for secondary earners and single parents with a youngest child of up to 12 years of age.Footnote 53 This leads to a substantial increase in the number of hours worked given the budgetary impulse, because it targets the groups with the highest labour supply elasticity. Since the response is mostly by women, this reform also causes a substantial decline in gender inequality, see also Table 5 in the Appendix. However, the knock-on effects are still limited, as secondary earners and working single parents with a young child already receive large subsidies in the base.

In column (8), we increase childcare subsidies. In particular, we consider a proportional decrease of 38% in the parental contribution that results after deducting the subsidy from the full hourly price. Again, there is a substantial increase in hours worked. The effect on total hours worked is somewhat larger than for reform (7), and also reduces gender inequality in hours worked more (again, see Table 5 in the Appendix). However, the childcare reform also leads to substantial substitution of informal care for formal care. As a result, the knock-on effect for the government budget is negative, making this reform less cost-effective than reform (7).

Moving from carrots to sticks, in column (9), we decrease the income-dependent child benefit for parents with young children.Footnote 54 This also leads to a substantial increase in hours worked and in the participation rate, and a decrease in gender inequality, in particular for couples with young children (Table 5 in the Appendix).Footnote 55 However, again, the downside of this ‘stick’ reform is that it increases income inequality, as we take benefits away from low-income households.

7.2 Major tax reforms

After considering changes in single policy instruments, we now consider a number of major tax reforms that feature prominently in the current policy debate, in the Netherlands and abroad. Specifically, we consider the introduction of a flat tax system, a basic income system, and a move towards joint rather than individual taxation.Footnote 56 The simulation results are given in Table 4.

Table 4 Major tax reformsa

In column (1), we change the four tax brackets rates in the baseline, 36.5%, 42%, 42%, and 52%, respectively, to a flat tax rate of 39.7%. This scenario is budgetary neutral before behavioural changes. We see that this flat tax increases overall labour supply. Men in couples and singles without children increase their labour supply, because the flat tax rate is lower than their initial marginal tax bracket rate. For women in couples, there are two opposing effects. First, women paying taxes in the first tax bracket now face a higher marginal tax rate and some women withdraw from the labour force. Second, women paying taxes in the second and higher tax bracket now face a lower marginal tax rate, triggering a positive response at the intensive margin. For women in couples, with children aged 0–17, the first effect dominates, whereas for women in other couples the second effect dominates. Finally, single parents increase their labour supply, which is caused by the sharp drop in the net welfare benefit due to the increase in the first bracket rate. The downside of this flat tax proposal is a substantial increase in income inequality. Furthermore, gender inequality in hours worked in couples increases, see Table 6 in the Appendix.

In column (2), we consider a flat tax system that is budgetary neutral and ‘Gini neutral’. Specifically, we introduce a lump-sum subsidy for all adults of 1.950 euros and finance this with a flat tax rate of 45.3%. We find that to arrive at the same income inequality as in the baseline, the flat tax reduces labour supply.Footnote 57 Indeed, the flat tax increases marginal and participation tax rates at the lower end of the income distribution, which is more responsive to tax changes in terms of hours worked than the upper end of the income distribution.Footnote 58 Gender inequality in hours worked in couples increases further (Table 6).

Column (3) shows that generic income support via the introduction of an unconditional basic income has a strong adverse effect on labour force participation. We simulate a basic income of 50% of the net welfare benefit level. All adults qualify for this basic income. For adults receiving social benefits (e.g. welfare, unemployment, disability or retirement benefits) we reduce the benefit level, so that together with the basic income their disposable income does not change. We finance the basic income scenario by abolishing the general in-work tax credit for all workers, which is in line with the idea that this type of income support is unconditional, and a flat tax rate of 56.6%.Footnote 59 The flat tax rate of 56.6% implies a considerable increase in all marginal tax rates, in particular for adults in the lower tax brackets. Furthermore, the introduction of a basic income increases income for non-working partners. Labour supply decreases both on the extensive and intensive margin, and in total by − 5.6%.Footnote 60 There is a dramatic increase in gender inequality in terms of hours worked (Table 6). However, on the upside, income inequality decreases, by almost 8%.

Finally, column (4) simulates a move from individual to joint taxation, as in the tax systems of, e.g. France, Germany, and the USA.Footnote 61 We simulate joint taxation by taking the sum of taxable income from both partners and then assign half of the total taxable household income to both partners. We finance this scenario by increasing marginal tax rates in all four tax brackets by 1.9 percentage points. Total labour supply decreases by − 2.2%. Most women in couples are secondary earners and face a relatively low marginal tax rate under the tax system in the baseline. Joint taxation means that the marginal tax rate increases for secondary earners and they reduce their labour supply. The effective marginal tax rate for primary earners decreases, and they increase their labour supply, but to a much lesser extent. Furthermore, income inequality increases, and there is again a rather dramatic increase in gender inequality in terms of hours worked (Table 6). Hence, this scenario scores unfavourably in terms of hours worked, income inequality and gender inequality.Footnote 62

8 Discussion and conclusion

In this paper, we used both structural models and quasi-experimental studies to study the effectiveness of tax-benefit reforms. Using a very large and rich data set, we estimate structural discrete-choice models for a large number of household types. We uncover large differences in the labour supply responses between various demographic groups, mostly related to the age of the youngest child. We also find that the decision of whether or not to participate is more responsive to financial incentives than the hours-per-week decision, although the hours-per-week decision is still non-negligible for women in couples with children. We used the structural model to simulate a number of key reforms from the past, and compared the predictions of the structural model with the outcomes of quasi-experimental studies on the same reforms. We find that the structural model predicts the estimated treatment effects from the quasi-experimental studies rather well.

We then conduct a counterfactual policy analysis with the structural model, and study the effectiveness of potential tax-benefit reforms in stimulating labour supply. We find that reducing marginal tax rates is not an effective way to promote labour supply. In-work benefits targeted at low-wage earners are more effective. Policies targeted at working mothers with young children generate the largest labour supply response, reduce gender inequality, but generate little additional revenue for the government. With the structural model we also simulate some major tax reforms that feature prominently in the current policy debate. We find that proposals for a move to a flat tax system, a basic income system or a system with joint taxation are not effective in stimulating labour supply, and cause a steep rise in gender inequality. Indeed, an efficient tax system accounts for the large heterogeneity in responses, between different demographic groups (e.g. primary vs. secondary earners and with vs. without young children) and different decision margins (e.g. extensive vs. intensive margin), and therefore cannot be too simple.

Although we believe that our analysis makes a number of improvements over previous studies on the effectiveness of tax-benefit reforms, it still has a number of limitations. We ignore involuntary unemployment (and a potential difference between preferred and actual working hours). However, estimating a double-hurdle model (Cragg 1971), we find that accounting for involuntary unemployment makes little difference in the employment responses to changes in financial incentives (De Boer 2018).Footnote 63 Furthermore, we ignore responses to marginal (and participation) tax rates other than labour supply. Part of the modern literature on public finance looks at a broader range of behavioural responses, by considering the so-called elasticity of taxable income, see Saez et al. (2012) for an overview. Indeed, a recent study by Jongen and Stoel (2019) for the Netherlands shows that the elasticity-of-taxable-income may be higher than the labour supply elasticity, suggesting larger distortions from tax rates than by looking solely at labour supply. We further ignore the life cycle. A number of studies have shown that accounting for life-cycle effects can be important for the analysis of tax-benefit reform (e.g. Imai and Keane 2004, 2011; Blundell et al. 2016). This would be an interesting direction for future research. However, the data set we used does not include data on, for example, consumption or savings, which makes it difficult to estimate a life-cycle model, and it should be noted that there is often a trade-off in modelling different parts of economic behaviour, due to the numerical complexities that arise.Footnote 64 Finally, we assume that all people are fully aware of their budget constraint. However, recent work by Chetty et al. (2009) shows that information, or the lack thereof, can play an important role in the behavioural responses to financial incentives. This too seems an interesting direction for future research.Footnote 65