(Mis)Measuring Sensitive Attitudes with the List Experiment,Public Opinion Quarterly

当前位置： X-MOL 学术 › Public Opinion Quarterly › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

(Mis)Measuring Sensitive Attitudes with the List Experiment
Public Opinion Quarterly ( IF 4.616 ) Pub Date : 2019-01-01 , DOI: 10.1093/poq/nfz009
Eric Kramon ₁ , Keith Weghorst ₂

Affiliation

List experiments (LEs) are an increasingly popular survey research tool for measuring sensitive attitudes and behaviors. However, there is evidence that list experiments sometimes produce unreasonable estimates. Why do list experiments “fail,” and how can the performance of the list experiment be improved? Using evidence from Kenya, we hypothesize that the length and complexity of the LE format make them costlier for respondents to complete and thus prone to comprehension and reporting errors. First, we show that list experiments encounter difficulties with simple, nonsensitive lists about food consumption and daily activities: over 40 percent of respondents provide inconsistent responses between list experiment and direct question formats. These errors are concentrated among less numerate and less educated respondents, offering evidence that the errors are driven by the complexity and difficulty of list experiments. Second, we examine list experiments measuring attitudes about political violence. The standard list experiment reveals lower rates of support for political violence compared to simply asking directly about this sensitive attitude, which we interpret as list experiment breakdown. We evaluate two modifications to the list experiment designed to reduce its complexity: private tabulation and cartoon visual aids. Both modifications greatly enhance list experiment performance, especially among respondent subgroups where the standard procedure is most problematic. The paper makes two key contributions: (1) showing that techniques such as the list experiment, which have promise for reducing response bias, can introduce different forms of error associated with question complexity and difficulty; and (2) demonstrating the effectiveness of easy-to-implement solutions to the problem. Public Opinion Quarterly doi:10.1093/poq/nfz009 D ow naded rom http/academ ic.p.com /poq/advance-articleoi/10.1093/poq/nfz009/5525050 by Vaderbilt U niersity Lrary user on 02 uly 2019 Survey researchers are often concerned with measuring sensitive attitudes and behaviors, including support for political violence, experience with corruption, and racial attitudes. A major challenge for studying such topics with surveys is social desirability bias: many individuals do not want to reveal socially unacceptable or potentially illegal attitudes and behaviors. Scholars have developed a number of strategies for reducing sensitivity-driven measurement error. The list experiment—or “item count technique”—is one approach that is increasingly popular in political science and related disciplines. In this paper, we evaluate two modifications to standard list experiment procedures. The first allows respondents to privately tabulate the number of items in the list that apply, thereby aiding accurate response while creating additional assurance of privacy. The second modification adds visual aids, which is intended to reduce respondent error—particularly among respondents who find the instructions and demands of a list experiment challenging. List experiments (LEs) reduce survey error by asking respondents about sensitive issues indirectly: sensitive items are embedded in a list with several nonsensitive items, and participants are asked how many items they agree with or apply to them, but not which ones (see examples found in tables 3 and 4 later in this paper). This approach reduces the perceived costs/risks of answering honestly. However, enthusiasm surrounding the list experiment has drawn attention away from its potential limitations. The length and complexity of the question format make them prone to comprehension and reporting errors. Importantly, such errors may be concentrated among certain population subgroups—those without experience answering complex survey questions or those who most prevalently hold the sensitive attitude of interest. Unfortunately, identifying the extent to which these issues bias list experimental data is challenging because survey respondents’ “true” answers to sensitive questions are usually unknown (Simpser 2017). Nonetheless, LEs often break down in obvious ways: producing estimates that are lower than the direct question, or even nonsensical ones, such as negative estimates (Holbrook and Krosnick 2010). In that light, we are motivated by two questions: Why do list experiments sometimes “fail” or break down? How can the performance of the list experiment be improved? In this paper, we examine the LE and its ability to reduce survey error in Kenya, where we sought to measure public support for political violence. First, we investigate the performance of the LE using lists of simple, nonsensitive items about food consumption and daily activities. We show that the LE encounters difficulties with these simple and nonsensitive lists: over 40 percent of respondents provide inconsistent responses in LE versus direct question formats. These “failures” are concentrated among less numerate and less educated respondents, evidence that errors are driven by LE question complexity and difficulty. Second, we turn to list experiments designed to measure attitudes about political violence. We find that the standard LE estimates lower rates of Kramon and Weghorst Page 2 of 28 ow naded rom http/academ ic.p.com /poq/advance-articleoi/10.1093/poq/nfz009/5525050 by Vaderbilt U niersity Lrary user on 02 uly 2019 support for political violence than those obtained by asking directly. These underestimates are most pronounced among less educated participants and those who provided inconsistent responses in the nonsensitive LEs described above, evidence that technique difficulty is driving list experiment breakdown. Finally, we evaluate two low-cost, context-appropriate modifications to the list experiment designed to reduce the complexity of the technique. The first allows for private tabulation, and the second combines private tabulation with cartoon visual aids. We find that both modifications improve list experiment performance, including among the subgroups that had difficulty with the nonsensitive LE. This paper contributes to the literature on survey response bias in two ways. First, we show that indirect techniques such as the list experiment, which have promise for reducing response bias, can introduce different forms of error that are associated with question complexity and difficulty. This is important because the survey literature is populated with list experiments that perform well; we highlight limitations that might not be obvious from reading this published literature because of publication bias and the “file drawer problem.” Our aim is not to suggest that all LEs are problematic, but rather to draw attention to these limitations. Our second contribution is demonstrating that relatively easy-to-implement and low-cost modifications can greatly enhance the performance of the technique, especially among populations where the standard procedure is most problematic. Modifications designed to reduce item complexity and difficulty can be adapted by applied survey researchers working in a range of contexts. Measuring Sensitive Attitudes with the List Experiment Attitudes toward violence are emblematic of the challenges of studying sensitive topics. Support for political violence is subject to under-reporting biases because such violence is illegal and its approval is generally socially undesirable. Past research on violence has addressed sensitivity-driven measurement error by alleviating perceived costs/risks of answering truthfully. Strategies include asking about violent behavior indirectly (Humphreys and Weinstein 2006), administering sensitive survey modules separately from a larger survey (Scacco 2016), anticipating or controlling for enumerator ethnicity effects (Kasara 2013; Carlson 2014; Adida et al. 2016), or one of several experimental approaches: endorsement experiments (Blair et al. 2013; Lyall, Blair, and Imai 2013), randomized response technique (Blair, Imai, and Zhou 2015), or the list experiment. The list experiment is a promising alternative to direct questions, offering respondents greater secrecy for sensitive responses (e.g., Kuklinski, Cobb, and Gilens 1997; Corstange 2018; Gonzalez-Ocantos et al. 2011; Blair and Imai 2012; Glynn 2013). The LE presents a sensitive statement as one of many items of a list and asks respondents to identify how many total list items apply to them. Participants are randomly assigned to either a treatment list including the Solutions to List Experiment Breakdown Page 3 of 28 ow naded rom http/academ ic.p.com /poq/advance-articleoi/10.1093/poq/nfz009/5525050 by Vaderbilt U niersity Lrary user on 02 uly 2019 sensitive item or a control list that does not. Because the lists are otherwise identical, and assignment is randomized, the difference in means between treatment and control lists can be attributed to the sensitive item. If successfully implemented, the technique yields an estimate of the prevalence of the sensitive attitude. Two assumptions must be satisfied for LE estimates to be valid: “no-liars” and “no design effects” (Blair and Imai 2012). The first states that respondents “do not lie about the sensitive item” (Rosenfeld, Imai, and Shapiro 2016, 795). The second requires that adding the sensitive item to a list does not change the way respondents engage with control items. Lists are generally designed to avoid “floor” and “ceiling” effects, which undermine how the sensitive attitude is rendered undetectable (Glynn 2013). For single LEs, the estimated prevalence of the sensitive item is the difference-in-means between treatment and control groups (e.g., Blair and Imai 2012; Streb et al. 2008). For example, if the control group mean is 2 and the treatment group mean is 2.2, the estimate in the sample would be 20 percent. In the double list experiment design (DLE), which uses two sets of lists such that all respondents receive one control list and one treatment list, the est

中文翻译：

（错误）用列表实验衡量敏感态度

列表实验 (LE) 是一种越来越流行的调查研究工具，用于衡量敏感的态度和行为。然而，有证据表明列表实验有时会产生不合理的估计。为什么列表实验会“失败”，如何提高列表实验的性能？使用来自肯尼亚的证据，我们假设 LE 格式的长度和复杂性使受访者完成它们的成本更高，因此容易出现理解和报告错误。首先，我们表明列表实验遇到关于食物消费和日常活动的简单、非敏感列表的困难：超过 40% 的受访者在列表实验和直接问题格式之间提供不一致的回答。这些错误集中在计算能力和教育程度较低的受访者中，提供证据证明错误是由列表实验的复杂性和难度引起的。其次，我们研究了衡量对政治暴力态度的列表实验。与直接询问这种敏感态度相比，标准列表实验显示对政治暴力的支持率较低，我们将其解释为列表实验崩溃。我们评估了旨在降低其复杂性的列表实验的两个修改：私人制表和卡通视觉辅助。这两种修改都极大地提高了列表实验的性能，尤其是在标准程序最成问题的受访者亚组中。该论文做出了两个关键贡献：(1) 展示了诸如列表实验之类的技术，它们有望减少响应偏差，可以引入与问题复杂性和难度相关的不同形式的错误；(2) 证明易于实施的问题解决方案的有效性。公众意见季刊 doi:10.1093/poq/nfz009 Dow naded rom http/academ ic.p.com/poq/advance-articleoi/10.1093/poq/nfz009/5525050 by Vaderbilt U niersity 研究人员经常是 02019关注衡量敏感的态度和行为，包括对政治暴力的支持、腐败经历和种族态度。通过调查研究此类主题的一个主要挑战是社会期望偏差：许多人不想揭示社会不可接受或可能非法的态度和行为。学者们已经开发了许多减少灵敏度驱动的测量误差的策略。列表实验——或“项目计数技术”——是一种在政治学和相关学科中越来越流行的方法。在本文中，我们评估了对标准列表实验程序的两个修改。第一个允许受访者私下将列表中适用的项目数量制成表格，从而帮助准确响应，同时创建额外的隐私保证。第二个修改增加了视觉辅助，旨在减少受访者的错误——特别是在发现清单实验的说明和要求具有挑战性的受访者中。列表实验 (LE) 通过间接向受访者询问敏感问题来减少调查错误：将敏感项目嵌入包含多个非敏感项目的列表中，并询问参与者他们同意或适用的项目有多少，但不是哪些（参见本文后面表 3 和表 4 中的示例）。这种方法降低了诚实回答的感知成本/风险。然而，围绕列表实验的热情已经将注意力从其潜在的局限性转移开。问题格式的长度和复杂性使他们容易理解和报告错误。重要的是，此类错误可能集中在某些人口亚组中——那些没有回答复杂调查问题经验的人，或者最普遍持有感兴趣的敏感态度的人。不幸的是，确定这些问题对实验数据的偏见程度具有挑战性，因为调查受访者对敏感问题的“真实”答案通常是未知的（Simpser 2017）。尽管如此，LE 经常以明显的方式崩溃：产生低于直接问题的估计，甚至是无意义的估计，例如负估计（Holbrook 和 Krosnick 2010）。有鉴于此，我们受到两个问题的启发：为什么列表实验有时会“失败”或失败？如何提高列表实验的性能？在本文中，我们研究了 LE 及其在肯尼亚减少调查错误的能力，我们试图衡量公众对政治暴力的支持。首先，我们使用关于食物消费和日常活动的简单、非敏感项目列表来调查 LE 的性能。我们表明，LE 在处理这些简单且不敏感的列表时遇到了困难：超过 40% 的受访者在 LE 与直接问题格式中提供不一致的回答。这些“失败”集中在计算能力和教育程度较低的受访者中，证明错误是由 LE 问题的复杂性和难度造成的。其次，我们转而列出旨在衡量对政治暴力的态度的实验。我们发现，标准 LE 估计了较低的 Kramon 和 Weghorst 率第 2 页，共 28 页，来自 http/academ ic.p.com /poq/advance-articleoi/10.1093/poq/nfz009/5525050 by Vaderbilt U niersity Lrary user on 0 uly 2019 对政治暴力的支持比通过直接询问获得的支持多。这些低估在受教育程度较低的参与者和在上述非敏感 LE 中提供不一致响应的参与者中最为明显，这证明技术难度正在推动列表实验失败。最后，我们评估两个低成本，对列表实验的上下文适当修改旨在降低技术的复杂性。第一个允许私人制表，第二个将私人制表与卡通视觉辅助结合起来。我们发现这两种修改都提高了列表实验的性能，包括在使用非敏感 LE 有困难的子组中。本文以两种方式为有关调查响应偏差的文献做出贡献。首先，我们展示了诸如列表实验之类的间接技术，它们有望减少响应偏差，但会引入与问题复杂性和难度相关的不同形式的错误。这很重要，因为调查文献中充斥着表现良好的列表实验；由于出版偏见和“文件抽屉问题”，我们强调了阅读这些已发表文献时可能不明显的局限性。我们的目的不是要表明所有 LE 都有问题，而是要引起人们对这些限制的注意。我们的第二个贡献是证明相对容易实施和低成本的修改可以大大提高技术的性能，特别是在标准程序最成问题的人群中。旨在降低项目复杂性和难度的修改可以由在一系列背景下工作的应用调查研究人员进行调整。用列表实验测量敏感态度对暴力的态度是研究敏感话题的挑战的象征。对政治暴力的支持会受到低估偏见的影响，因为这种暴力是非法的，而且它的批准通常在社会上是不受欢迎的。过去关于暴力的研究通过减轻如实回答的感知成本/风险来解决敏感性驱动的测量错误。策略包括间接询问暴力行为（Humphreys 和 Weinstein 2006）、将敏感调查模块与大型调查分开管理（Scacco 2016）、预测或控制调查员种族影响（Kasara 2013；Carlson 2014；Adida 等人 2016），或几种实验方法之一：背书实验（Blair 等人，2013 年；Lyall、Blair 和 Imai 2013 年）、随机响应技术（Blair、Imai 和 Zhou 2015 年）或列表实验。列表实验是直接问题的一个很有前途的替代方案，为受访者提供对敏感反应更大的保密性（例如，Kuklinski、Cobb 和 Gilens 1997；Corstange 2018；Gonzalez-Ocantos 等人 2011；Blair 和 Imai 2012；Glynn 2013）。LE 将敏感陈述作为列表中的多个项目之一，并要求受访者确定适用于他们的列表项目总数。参与者被随机分配到一个治疗列表中，包括列出实验分解的解决方案第 3 页，共 28 页，http/academ ic.p.com/poq/advance-articleoi/10.1093/poq/nfz009/5525050 by Vaderbilt U用户在 02 uly 2019 敏感项目或没有的控制列表。由于列表在其他方面相同，并且分配是随机的，因此处理列表和控制列表之间的均值差异可归因于敏感项目。如果成功实施，该技术产生了对敏感态度普遍性的估计。要使 LE 估计有效，必须满足两个假设：“无骗子”和“无设计效果”（Blair 和 Imai 2012）。第一个声明受访者“不会就敏感项目撒谎”（Rosenfeld、Imai 和 Shapiro 2016，795）。第二个要求将敏感项目添加到列表中不会改变受访者处理控制项目的方式。列表通常旨在避免“地板”和“天花板”效应，这会破坏敏感态度如何变得无法检测（Glynn 2013）。对于单个 LE，敏感项目的估计流行率是治疗组和对照组之间的均值差异（例如，Blair 和 Imai 2012；Streb 等人 2008）。例如，如果对照组平均值为 2，治疗组平均值为 2.2，样本中的估计值为 20%。在双列表实验设计 (DLE) 中，它使用两组列表，所有受访者都收到一个控制列表和一个处理列表，估计

更新日期：2019-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>