1 Introduction

Gender disparities in outcomes favouring men within developing countries are generally larger than in the developed world (Jayachandran 2015). Similar disparities also often exist in attitudes towards gender equality (see, for instance, Asadullah and Wahhaj 2019; Borrell-Porta et al. 2019). Such attitudes, including attitudes supporting domestic violence and child marriage, can have important effects on behaviours, as well as play a prominent role in explaining observed gender disparities in outcomes. For instance, Dhar et al. (2016) find that parents’ discriminatory attitudes reduce daughters’ aspirations to pursue schooling beyond secondary school in India. Maertens (2013) finds that parents’ perceptions of ideal age at marriage have an adverse impact on daughter’s schooling. Despite the important role that attitudes play in explaining behaviours and outcomes, much of the existing empirical literature relies on survey-based direct questions eliciting attitudes that are likely to suffer from measurement error leading to biased estimates. Our study addresses this limitation by making use of list experiments that enable us to elicit attitudes regarding gender roles and behaviours in a better way.

One aspect of their lives in which women face a disadvantage is as victims of violence. Women in South Asia face various forms of violence throughout their lifetimes, from early childhood to old age (Solotaroff and Pande 2014). Surprisingly, some of the global hotspots for child marriage and violence against women are also places that have seen considerable progress in poverty reduction and women’s economic participation. One such developing country setting is Bangladesh, where female labor force participation for women age 15+ increased from 23% in 1990 to 33% in 2017.Footnote 1 While the incidence of child marriage has fallen in recent decades, it remained the case that approximately 38% of women in Bangladesh were married by the time they were 15 years of age and 74.7% were married by the time they were 18 years of age in 2014. Support for intimate partner violence is relatively high among both young and old women in Bangladesh and has remained unchanged in recent decades despite improvements in female labor force participation (see Appendix 1 for more details). In 2014, 28.3% of Bangladeshi women agreed that a husband is justified in beating his wife for one of the following reasons: if the wife burns the food, if she argues with the husband, if she goes out without telling him, if she neglects the children or if she refuses to have sexual intercourse with him.Footnote 2 Finally, 22.4% of ever married Bangladeshi women report being a victim of physical or sexual violence by their husband/partner in the last 12 months.Footnote 3 These kinds of disparities, which disproportionately put women at a disadvantage within the household compared with men, are likely to have serious implications for women’s welfare.

The objective of this paper is to measure attitudes towards intimate partner violence and child marriage among adolescent girls in rural Bangladesh. We do this by using standard survey questions (direct questions) as well as methods to elicit responses to sensitive survey questions (list experiments, see “Section 2” for a description and examples). We find that the method of measurement matters when eliciting responses to sensitive questions. Standard direct survey questions under-estimate support for socially harmful practices—few adolescent girls in the study accept the practice of intimate partner violence (5%) or child marriage (2%) when asked directly. List experiments reveal substantially higher support for both intimate partner violence (30%) and child marriage (24%). Adolescent girls with lower levels of education under-report their support for child marriage by 16 percentage points compared with adolescent girls with higher education. We also find that girls randomly exposed to village level adolescent clubs set up by the Bangladeshi non-government organization BRAC, which educated them on marital rights and laws, under-report their support for intimate partner violence in comparison with non-exposed adolescent girls. To our knowledge, ours is the first study to use list experiment methods to elicit attitudes towards child marriage as well as domestic violence, and has greater external validity than similar empirical investigations.

The rest of the paper is organised as follows. “Section 2” provides a literature review and lists the contributions this paper makes to the literature. “Section 3” describes the background of our study and list experiment that generated the data we use in this paper. “Section 4” lays out the empirical analysis that we perform on the data as well as including a discussion of our findings, while “Section 5” concludes.

2 Literature review and contribution

An important concern when eliciting gender attitudes in surveys is related to measurement error.Footnote 4 Suppose a survey respondent is queried on whether they consider domestic violence to be acceptable. It is very likely that they would either choose not to respond to the query (leading to systematic item non-response) or respond to the effect that they do not consider domestic violence to be acceptable (misreporting, which might arise from social desirability). In either case, the resulting measurement error will lead to biased estimates when investigating the relationship between gender attitudes and other outcome variables that we might be interested in (Bound et al. 2001).

There are different strategies that have been employed to deal with measurement error when eliciting responses to sensitive questions. One is to rely on administrative data rather than self-reports, even though in developing countries they are usually not systematically collected nor well registered.Footnote 5 Another is to use intensive qualitative fieldwork, as done by Blattman et al. (2016), in which local researchers spend several days with a random sub-sample of survey respondents after a survey has taken place. They then obtain verbal confirmation of sensitive behaviours allowing the employment of a validation technique to examine the nature of measurement error in survey responses. Alternatively, one may use quantitative survey methods to examine measurement error in responses to sensitive questions, such as randomized response techniques, endorsement or list experiments.Footnote 6

In this paper, we use a list experiment which is also known as the item count or unmatched count technique. In a list experiment, survey respondents are queried on the number of items they agree with on a list, which (randomly) either includes or excludes a sensitive item (Miller 1984; Imai 2011). We use list experiments to deal with measurement error in elicited gender attitudes in Bangladesh.

Several recent empirical studies make use of list experiments. Karlan and Zinman (2012) use a list experiment to indirectly elicit how borrowers from microfinance institutions (MFI) use their loan proceeds in Peru and the Philippines. Using the results from the list experiment and comparing them with responses to direct survey questions, they find that direct elicitation underreports the non-enterprise use of the loan proceeds by borrowers from MFIs. List experiments have also been employed to elicit truthful responses to sexual behaviours, such as condom use, number of partners and unfaithfulness in Uganda (Jamison et al. 2013), Colombia (Chong et al. 2013) and Côte d’Ivoire (Chuang et al. 2019); harmful traditional practices against women in Ethiopia (De Cao et al. 2017; De Cao and Lutz 2018; Gibson et al. 2018) and anti-gay sentiment in the USA (Coffman et al. 2016).

A recent study examining gender attitudes (specifically those related to female genital mutilation/cutting) in Ethiopia finds under-reporting in direct attitude questions of 10 percentage points (De Cao and Lutz 2018). This study also provides suggestive evidence that under-reporting is more pronounced among uneducated women and among women who were targeted by a non-government organization (NGO) intervention to strengthen the health system as well as sexual and reproductive health knowledge.

A few recent studies have also used list experiments to examine measurement error in domestic violence reporting and prevalence. Peterman et al. (2017) use a list experiment combined with an unconditional cash transfer given to female caregivers of children younger than 5 in rural Zambia. They find that 15% of the women had experienced physical intimate partner violence in the last 12 months. They also find no effect of the cash transfer on intimate partner violence 4 years after the program. Since direct questions are not asked, it is not possible to examine the direction or magnitude of the measurement error in such questions. Joseph et al. (2017) use a list experiment in Kerala, India to find that the level of under-reporting of domestic violence is over 9 percentage points, while being negligible for physical harassment on buses. They analyse the list experiment using difference-in-means across sub-groups of the population. Unlike Peterman et al. (2017) and Joseph et al. (2017), Aguero and Frisancho (2018) follow WHO guidelines and protocol to ask female respondents direct questions on violence (which are comparable to the widely used domestic violence questions asked in the Demographic and Household Surveys) and compare these with a list experiment used to elicit information on experiences of physical and sexual intimate partner violence. They use a sample of female clients of a micro credit organisation operating in urban areas of Lima, Peru. Aguero and Frisancho (2018) find that more educated women systematically underreport violence more often, but that there is no under-reporting by women who have less education. They also describe a low-cost solution to correct for bias in a setting in which there is non-classical measurement error in the dependent variable (for instance the dependent variable could be intimate partner violence as in Aguero and Frisancho 2018); there is no measurement error in the independent variable and endogeneity is present. This solution involves using estimates generated from list experiments carried alongside other survey instruments.

Our work contributes to this literature in several important ways. It is the first study to use a list experiment to elicit attitudes towards child marriage, at the same time being among the first to develop a list experiment for domestic violence alongside Peterman et al. (2017), Joseph et al. (2017) and Aguero and Frisancho (2018). Since our sample comprises a third of all districts of Bangladesh, it has greater external validity than many of the other empirical investigations in this area (e.g., urban Lima in Aguero and Frisancho (2018)). Ours is also the first study that makes use of an RCT (non-formal education intervention) to analyse how support for domestic violence and child marriage changes with the intervention while also using a list experiment. We analyse our list experiments by using regression techniques that allow us to investigate how the probability of supporting the sensitive question varies as a function of respondent’s characteristics (as in Coffman et al. 2016; Aguero and Frisancho 2018; De Cao and Lutz 2018), improving on earlier papers that only compute difference-in-means across sub-groups of the population (e.g., Karlan and Zinman 2012; Joseph et al. 2017). Finally, we discuss the validity of our list experiments in relation to recent criticisms raised by Chuang et al. (2019).

3 Data and study design

3.1 Survey design

Data on list experiments eliciting attitudes towards domestic violence and child marriage used in this study was collected in February 2017 as part of an end-line survey to evaluate the Adolescent Development Programme (ADP), a randomized control trial (RCT) intervention implemented by, BRAC, the largest NGO in Bangladesh. The ADP intervention introduced village level random variation in adolescent girls’ exposure to non-formal education on marital rights and laws. The program design of the intervention is described in detail in Appendix 2. The baseline survey design considered 27 BRAC branch offices located in the 19 poorest districtsFootnote 7 where BRAC was about to implement and scale up the ADP scheme by the end of 2012. Of a total 216 villages in the sample under 27 BRAC branches, half were assigned to the program and the remaining served as non-program villages. Randomisation was done at the village level where BRAC branches were considered as clusters, i.e. it is a clustered RCT. Within the catchment area of each sample village, 20 adolescents (ages 11–16), of whom 15 were girls and five boys, were interviewed. A total of 4320 adolescents (3240 females) ages 11–16 years were interviewed across all villages as part of the baseline survey in June 2012.Footnote 8 The same adolescents were interviewed in the end-line survey in February 2017, with 2732 (or 63% of the baseline respondents) being successfully re-interviewed (of which 2020 are females).Footnote 9 Appendix Table 8 gives a comparison of characteristics across the sample of 3240 adolescent girls and the sub-sample of 2020 who completed the end-line survey. An important concern is related to potentially selective attrition of subjects from baseline to end-line. However, observable characteristics of adolescent girls which are related to age, religion and education as well as their mother’s age, education and empowerment measures at baseline for the complete sample and the sub-sample successfully re-interviewed at end-line are very similar, making selective attrition unlikely in this setting.

3.2 List experiments to elicit gender attitudes

The list experiment question used to elicit attitudes towards domestic violence included the following items: (1) If the father is too busy with outside work, this has a negative impact on children’s education; (2) it is not acceptable to use contraceptives to avoid pregnancy; (3) in a marriage both husband and wife should decide on how many children to have; (4) a wife can be hit, slapped, kicked or physically hurt by the husband under any circumstances.Footnote 10

The list experiment question used to elicit attitudes towards child marriage included the following items: (1) It is important for girls to attend school; (2) birth of a girl brings as much happiness to a family as birth of a boy does; (3) literate mothers can take care of their children better than illiterate mothers; (4) a girl should be married off before 18.

We randomly divided our respondents in two groups, A and B, that acted as either control or treatment for the first or second list experiment. This allowed us to reduce bias in the answers, given that only one list experiment with one sensitive item was asked from each respondent. In both list experiments, the sensitive item is the last item.Footnote 11 For each list experiment, the control group was asked the list experiment with only items (1)-(2)-(3). We carefully selected our non-sensitive questions after discussions with BRAC, and although the items (1)–(3) in both list experiments seem sensitive too, in the local setting they fit as non-sensitive items given the illegality of the sensitive one.Footnote 12 Moreover, recent research shows that non-sensitive items more closely related to the sensitive one perform better because they make the sensitive item less salient (Chuang et al. 2019).

In early January 2017, BRAC researchers piloted the list experiment questions in the Karail slums in Dhaka district with 10 female adolescents participating in the pilot. The primary objective was to verify the adequacy of the list experiment statements, and to assess the appropriateness of using stones (marbles) by interviewees to describe their responses. Stones were used to avoid numeracy-related bias in responses (as in De Cao and Lutz 2018). The survey work was conducted by a team of 50 enumerators who received an intensive week-long training at the BRAC head office which was directly supervised by study team members as well as field management trainers from BRAC’s Research and Evaluation Division (RED). The majority of enumerators (38 out of 50) were females keeping in mind the study population (where 75% adolescent respondents were female). In total, the enumerators were organized in 15 teams so that each team in a sample site had enough female enumerators available to interview a female adolescent.

The list experiment questions were asked in the last page of a long questionnaire (about 40 pages long). Direct questions phrased in the same way as the sensitive item in the list experiments were asked from all respondents but at around page 20 of the questionnaire. We have no reason to believe that respondents were cognisant of this design and that the format influenced the list experiment results as so many different issues were dealt with during the interview. However, when we analyse the direct questions, we focus on the sample corresponding to the list experiment control group. See also “Section 4.4” for further discussion on the validity of our list experiments.

3.3 Estimation sample

Given that the targets of the NGO intervention were primarily adolescent girls, we restrict our estimation sample to adolescent girls who responded to the list experiment questions asked in the end-line survey. This gives us an estimation sample of 2020 adolescent girls. Table 1 reports descriptive statistics for this sample. Half of the sample was exposed to the ADP program.Footnote 13 Of the respondents, 42.5% had less than 9 years of schooling or had (at most) completed junior secondary education while the rest had either secondary or tertiary education. The average age while completing the end-line survey was 17.5 years. 28% of the adolescent girls were married by the time they completed the end-line survey, of whom 71% were married before age 18.

Table 1 Descriptive statistics

When directly asked about gender attitudes, only 2% of the adolescent girls agreed that a girl should be married off before age 18. Similarly, only 5% agreed that a wife can be hit, slapped, kicked or physically hurt by the husband under any circumstances. This is striking because of the high prevalence of early marriage as well as violence against women in the study area. For example, turning to maternal characteristics, in 15% of the cases, respondents’ mothers reported being beaten at least once by their husband in the last 12 months. Moreover, about 46% of the girls’ mothers in the estimation sample were pregnant before age 18. Female respondents in rural Bangladesh are also subject to patriarchal social norms—89% of the mothers of adolescent respondents reportedly practiced PurdahFootnote 14 when they went out.

4 Empirical strategy and results

4.1 Empirical strategy

In a standard list experiment design a sample of respondents (N) is randomly divided in two groups: control and treatment. Each respondent in the control group (Ti = 0, where i indicates the individual) receives a list of J non-sensitive, yes/no items, and is asked to provide the total number of items he/she agrees on. The same applies to each respondent in the treatment group (Ti = 1) where the list is increased by one item to include the sensitive item (J + 1 items). Let us assume \( {Z}_{\mathrm{ij}}^{\ast } \) to be the respondent i’s truthful preference to the jth item, j = 0, 1, …, J (Imai 2011). We have that Zij(T) is one if the answer to the jth item is one, and zero otherwise. The econometrician only observes Yi = Yi(Ti) where \( {Y}_i(0)={\sum}_{j=1}^J{Z}_{\mathrm{ij}}(0) \) or \( {Y}_i(1)={\sum}_{j=1}^{J+1}{Z}_{ij}(1) \).

A list experiment is valid (Imai 2011; Blair and Imai 2012) if: (a) the randomisation is good, meaning that for each respondent \( \left\{{\left\{{Z}_{\mathrm{ij}}(0),\left.{Z}_{ij}(1)\right\}\right.}_{j=1}^J,\left.{Z}_{i,J+1}(1)\right\}\right.\perp {T}_i \); (b) there are no design effects meaning that the inclusion of the sensitive item does not change the sum of affirmative answers to the non-sensitive items (\( {\sum}_{j=1}^J{Z}_{ij}(0)={\sum}_{j=1}^J{Z}_{\mathrm{ij}}(1) \)); (c) there are no liars meaning that the respondent replies truthfully to the sensitive item (\( {Z}_{i,J+1}(1)={Z}_{i,J+1}^{\ast } \)). Assumption (c) is also called ceiling and floor effects. Ceiling effects occur when a respondent in the treatment group gives the answer Yi = J even if he/she would have replied Yi = J + 1. Floor effects occur instead when a respondent in the treatment group answers Yi = 1 even if he/she would have replied Yi = 0.

If the list experiment satisfies (a), (b) and (c), then support for the sensitive item can be obtained by simply using a difference-in-means estimator:

$$ \hat{\rho}=\frac{1}{N_1}{\sum}_{i=1}^N{T}_i{Y}_i-\frac{1}{N_0}{\sum}_{i=1}^N\left(1-{T}_i\right){Y}_i $$
(1)

where \( {N}_1={\sum}_{i=1}^N{T}_i \) is the treatment group size and N0 the control group size. To investigate how preferences over the sensitive item change with changes in respondent’s characteristics, a multivariate regression model can be used.Footnote 15 In particular, the following equation can be estimated:

$$ {Y}_i={X}_i^T\gamma +{T}_i{X}_i^T\delta +{\varepsilon}_i $$
(2)

where Xi are the respondent’s characteristics and (γ, δ) are the parameters to estimate. We can estimate (γ, δ) using ordinary least squares (OLS).

4.2 Estimation results

In Table 2, we present the distribution of responses to our two list experiments (LE). The proportion of women in favour of domestic violence (DV) and child marriage (CM) is computed using the difference-in-means estimator and is respectively 30% (SE = 0.028) and 24% (SE = 0.026).Footnote 16

Table 2 Distribution of responses to the list experiments

Table 3 reports the analysis when we run a linear regression model, as in Eq. (2). The first four columns report the results where the outcome is the list experiment outcome for domestic violence (LE DV), while the remaining four refer to the list experiment outcome for child marriage (LE CM). Columns (1) and (5) report regressions where the list experiment outcomes are regressed only on a list experiment indicator (Ti); these correspond to the difference-in-means estimate from Eq. (1). The next columns in Table 3 add the most important individual characteristics. The coefficients of interest are the ones interacted with the list experiment dummy (δ). Column (2) provides the effect of being exposed to ADP on LE DV and finds a surprisingly positive effect indicating an increase in support for domestic violence. Column (4) also includes age, primary education and marital status, and it shows that ADP-exposed adolescent girls are 11.4 percentage points (p value = 0.005) more likely to be in favour of domestic violence than girls not exposed to the ADP intervention even after controlling for other individual characteristics. The results of the list experiment outcome for child marriage, instead, reveal an interesting effect of education. Column (8) shows that less educated girls, who have at most completed primary education, are 16.2 percentage points (p value = 0.012) more likely to support child marriage compared to more educated girls.

Table 3 Linear regression estimates for responses to the list experiments

We report and use robust standard errors when interpreting our results in the previous paragraph. We also compute and report p values computed using the wild bootstrap when clustering at the NGO branch level (since there are only 27 NGO branches). Our results remain robust to the use of clustered standard errors at the NGO branch level.

4.3 Social desirability bias

In this section, we examine social desirability bias by comparing attitudes towards domestic violence and child marriage measured via a list experiment with the same attitudes measured via a standard direct survey question (DQ). Table 1 reports that only 5% and 2% of respondents support domestic violence and child marriage when asked directly. When considering the direct question on domestic violence (DQ DV), we restrict our sample to the control group in the list experiment for domestic violence. Similarly, when considering the direct question on child marriage (DQ CM), we restrict our sample to the control group in the list experiment for child marriage. This implies that when we compare the direct question response with the list experiment, each respondent would have answered the sensitive question only once. In Table 4, we estimate linear probability models, using an indicator variable taking the value one if a girl supports domestic violence in columns (1)–(4) as the dependent variable of interest.Footnote 17 An indicator variable taking the value one if a girl supports child marriage is used as the dependent variable of interest in columns (5)–(8). Explanatory variables include a girl’s main characteristics (ADP exposure, age, marital status and education). While being exposed to ADP has no effect on adolescent girl’s attitudes, primary education is positively associated with the probability of supporting domestic violence, while age is negatively associated with the probability of supporting child marriage. Lower educated girls (with at most primary education) are about 3 percentage points more likely to support domestic violence; while being a year older decreases the probability that child marriage is supported by 0.5 percentage points.Footnote 18

Table 4 Linear regression estimates for responses to the direct attitude questions

Next, we empirically test whether there are statistically significant differences between the estimates obtained using list experiments vs. direct questions eliciting gender attitudes. This difference tells us how much the true support for domestic violence or child marriage is under-reported. The underlying assumption is that true support for domestic violence or child marriage is measured using the list experiment. A second assumption is that the measurement error in the direct questions and list experiments has the same sign. Formally, let us define Zi, J + 1(0) as the respondent i’s potential answer to the sensitive item when asked directly (Blair and Imai 2012). Then, social desirability bias is as follows:

$$ S(x)=\Pr \left({Z}_{i,J+1}^{\ast }=1|{X}_i=x\right)-\Pr \left({Z}_{i,J+1}(0)=1|{X}_i=x\right) $$
(3)

The first term can be estimated as in Eq. (2), while the second can be estimated with a linear probability model regressing the observed value of Zi, J + 1(0) on Xi. Given that the LE only allows to identify the total number of items the respondent agrees on but not which ones (e.g., \( {Z}_{i,J+1}^{\ast } \) cannot be identified), we cannot study the social desirability bias at the individual level, but at aggregate level.

Table 5 reports the differences in the estimated proportion of girls answering the sensitive item in the affirmative when using the list experiment or the direct question by socio-demographic characteristics.Footnote 19 The direct question estimates correspond to the list experiment control group sub-samples. The first row of Table 5 shows the unconditional results; these reveal a large difference of 24 percentage points in support for domestic violence, and 22 percentage points in support for child marriage. In the following rows of Table 5, we report differences in gender attitudes elicited using list experiments and direct questions by examining the estimated proportions for different groups whilst controlling for all other characteristics. All differences are highly statistically significant and between 15 and 30 percentage points. When indirectly questioned, girls seem to be much more in favour of both domestic violence and child marriage than when asked directly. Which girls under-report their support the most? By taking differences again between groups (e.g., married versus non-married and primary educated versus secondary/tertiary educated) from columns (5) and (6), we find two interesting results. First, girls exposed to the ADP intervention are more likely to under-report their support (by 12 percentage points) for domestic violence compared to girls who are not exposed to the intervention (p value = 0.044). Second, less educated girls (i.e. primary schooling or below) are 15 percentage points more likely to under-report their support for child marriage than the higher educated girls (p value = 0.008).

Table 5 Social desirability bias

We also estimate the proportions for the DQ using probit models. Table 13 shows the social desirability bias when the DQ predictions and their standard errors (columns (3)–(4)) come from the probit models used in Table 12. Reassuringly, the results are very similar to Table 5.

4.4 Validity of the list experiments

In “Section 4.1”, we discussed the conditions for list experiments to be valid. Here, we discuss the validity of each of them in the context of the list experiments that we implemented. The balance tests for the randomisation of the list experiments can be seen in Table 6. Column (5) reports the p value of the t test statistic where each main variable in the control group is compared with the one in the treatment group. None of the differences are statistically significant, indicating that our list experiment randomisation is good.

Table 6 Tests of randomisation for the list experiments

To test if there is a violation of the design effects assumption, Blair and Imai (2012) developed a statistical test. The null hypothesis of this test indicates no design effects, and we fail to reject it.Footnote 20 This indicates that the inclusion of the sensitive item did not change the responses to the non-sensitive items.

The third requirement for a valid list experiment is the absence of ceiling or floor effects. This assumption called no liars cannot be statistically tested (with the linear model used in this paper, Blair and Imai 2012), but we can analyse the distribution of responses to our list experiments (see Table 2 and Fig. 1). As can be seen, responses to the list experiments are well distributed, being mainly concentrated around 2 and 3. None of the respondents responded zero to either list experiment, but floor effects are expected to play a minor role. Table 2 shows that 4% and 5% of the respondents have no problems in revealing their support for domestic violence and child marriage, but there are quite a few girls who replied “3” to both list experiments, particularly the one on child marriage. In Table 7, we run different regressions to analyse floor and ceiling effects. We create an outcome called floor LE DV (Floor LE CM) that takes the value one if the LE DV (LE CM) is equal to one and zero otherwise; and an outcome ceiling LE DV (ceiling LE CM) that takes the value one if the LE DV (LE CM) is equal to three and zero otherwise. We regress these outcomes on the main respondent characteristics for the list experiment control group. This allows us to see who is most likely hitting the floor or the ceiling and may thus be over- or under-reporting her support for domestic violence or child marriage, not reporting “0” or “4”. We find no statistically significant effect of any of those characteristics on the outcomes, except for primary education on ceiling effects for the list experiment on child marriage. This result could indicate a ceiling effect for less educated girls. Bearing in mind this limitation, it has been shown that when there are ceiling (or floor) effects, the true support for the sensitive item is underestimated (Blair and Imai 2012).

Fig. 1
figure 1

Distribution of list experiment responses

Table 7 Floor and ceiling effects

Given that our list experiments show heterogenous effects by ADP exposure for domestic violence, and by education for child marriage, we examine the distribution of responses to the list experiments by these characteristics in Figs. 2 and 3. This is not a formal test, but the idea behind these Figures is to try to understand if these girls (less educated and ADP exposed) have understood the mechanism behind the list experiment and have manipulated their results. For the sake of comparison, we report the distribution of the list experiments also for the highly educated girls and girls not exposed to ADP. In Fig. 2, we can see how responses to the list experiment on domestic violence in the different groups is well distributed, with only a small number of cases at the extremes. In Fig. 3, the distribution of responses to the list experiment on child marriage shows that some girls gave the response 3, which might indicate the presence of a ceiling effect. In this case, we might have an underestimate of the true support for child marriage. Tests for design effects run on the sub-sample of low educated, high educated, ADP-exposed and ADP-non-exposed adolescent girls always fail to reject the null hypothesis of no design effects (results available upon request).

Fig. 2
figure 2

List of experiment responses by education and ADP exposure for domestic violence

Fig. 3
figure 3

List experiment responses by education and ADP exposure for child marriage

In a recent paper, Chuang et al. (2019) critically examine the usefulness of indirect survey methods such as list experiments and randomized response techniques. They implement a large number of double list experiments within a single survey taken by respondents in Côte d’Ivoire where groups A and B acted as treatment and controls for the same sexual or reproductive health sensitive behaviour; in this design, the non-sensitive items for groups A and B need to be different by construction. Use of double list experiments allows the generation of two difference-in-means estimators that can be compared, which to date had only been used to reduce the variance compared to a single list experiment (Droitcour et al. 1991; Glynn 2013). For most sensitive behaviours, Chuang et al. (2019) find statistically significant differences in the two difference-in-means estimates obtained for every list experiment; they conclude by suggesting that such comparisons (which can only be carried out using double list experiments) be used to check the internal consistency of the list experiment technique.

Since we did not implement double list experiments, we cannot carry out the tests proposed in Chuang et al. (2019). We preferred to ask our respondents direct questions on attitudes to compare them with the indirect list experiment questions. We believe this method is better suited to our objective to examine social desirability bias since in a double list experiment everyone is asked about the sensitive item twice, both directly or via the list experiment.Footnote 21 An important exercise in Chuang et al. (2019)’s work is the variation in the type of non-sensitive items ranging from innocuous items to items related to the sensitive item. The authors find that non-sensitive items more closely related to the sensitive one perform better. None of the non-sensitive items in our list experiments are innocuous, making the sensitive item less salient, and supporting the validity of our design.

Tables 13 and 14 report similar analysis to respectively Tables 3 and 4 but with additional controls. We included controls for the following measures of maternal empowerment: if the respondent’s mother has been beaten by her husband, was married early, became pregnant early, and if she practices purdah. None of these additional variables are statistically significant, except if the mother practices purdah which increases the likelihood of supporting child marriage when asked directly. Nonetheless, adding these variables does not change our main findings.

5 Discussion

Our findings show that measurement error is important when examining attitudes towards sensitive issues such as domestic violence or child marriage. Under-reporting can be quite high. We find that only 5.4% of adolescent girls support domestic violence when questioned directly, but 29.7% support domestic violence when questioned indirectly via a list experiment. Similar results are shown for child marriage, where 2.1% of the respondents think a girl should be married off by age 18 when asked a direct question, but support increases to 23.9% when asked via a list experiment.

Interestingly, we find girls who have lower education under-report their support for child marriage compared to girls who have higher education. To the best of our knowledge, this is the first study which implements a list experiment to examine attitudes towards child marriage; therefore, we cannot compare this result with existing studies. There are no heterogenous effects by education when looking at attitudes towards domestic violence. In contrast, Aguero and Frisancho (2018) use a list experiment to study domestic violence experiences in urban Lima (Peru), and find high under-reporting among the most educated respondents. This difference could be related to the different contexts or to the fact that we aim at measuring attitudes, while Aguero and Frisancho focus on behaviours. Our survey asks girls if “a wife can be hit, slapped, kicked or physically hurt by the husband under any circumstances”. In our context, girls with at most primary education might have more to lose if they do not support domestic violence, while educated girls might have better outside options (e.g., better jobs) and depend less on their husband. De Cao and Lutz (2018) examine attitudes towards female genital cutting in Ethiopia and find, similarly to us, that uneducated women are less willing to share their support for the practice.

Finally, we find suggestive evidence that the social desirability bias for domestic violence is larger among adolescent girls exposed to ADP. ADP is a random intervention; hence, we can interpret its effect to be causal, even if only marginally statistically significant.Footnote 22 The intervention focuses on the change in traditional attitudes through non-formal training and dissemination of information regarding sexual health, gender rights and legal provisions for violence against women including child marriage. It is certainly possible that respondents in ADP-exposed areas conform to the expectations of those providing the program treatment. The ADP campaign aims at changing the local customs and this may increase social pressures around gender attitudes resulting in a stronger incentive to reveal a biased answer. We provide a more detailed comparison of the ADP program with other similar programs in developing countries in Appendix 2.

6 Conclusion

Traditional “gender attitudes” or beliefs regarding the appropriateness and/or acceptability of gender-specific roles and behaviour in society are considered important drivers of women’s well-being. While measures of gender attitudes are now included in many representative international and national surveys, they suffer from potential measurement error, limiting their usefulness in empirical research. Using a unique data set from Bangladesh, we confirm that subjective responses to sensitive direct questions under-estimate support for regressive social practices such as wife beating and child marriage. We find that girls with higher education are more supportive of egalitarian gender norms pertaining to child marriage. While we do not claim this to be a causal relationship, our finding is supportive of expanding access to education to young girls in developing country settings. We also find that exposure to a program that disseminated knowledge on gender empowerment led girls to hide their true support for domestic violence. This indicates that (at least in the short-term) programs like the ADP might not have the desired effects on gender attitudes. We also find that different individual characteristics are associated with under-reporting of different aspects of gender attitudes. For instance, education matters for under-reporting of attitudes which pertain to child marriage, while ADP exposure matters for under-reporting of attitudes regarding domestic violence. This indicates that there are no simple prescriptions or general rules that apply across all aspects of gender attitudes. Our research suggests that survey methods matter in eliciting attitudes towards gendered violence and child marriage. The evidence presented in this paper also highlights the difficulty in permanently shifting gender attitudes exclusively through social empowerment programs even in a setting where girls’ schooling and economic opportunities have improved considerably in recent decades.

Our results confirm the relevance of potential bias in responses to standard direct questions when the outcome of interest is sensitive. We suggest practitioners to measure each sensitive outcome using different survey methodologies to test if there is indeed under- or over-reporting. We believe this is particularly important in the context of policy impact evaluations where gathering complementing evidence about the effectiveness of a program or intervention is crucial when attitudes or behaviour concern sensitive topics.