Florent Bédécarrats, Isabelle Guérin, François Roubaud (Eds.) Randomized Control Trials in the Field of Development: A Critical Perspective Oxford University Press, 2020, 448 p., $100.00,Population and Development Review

当前位置： X-MOL 学术 › Population and Development Review › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Florent Bédécarrats, Isabelle Guérin, François Roubaud (Eds.) Randomized Control Trials in the Field of Development: A Critical Perspective Oxford University Press, 2020, 448 p., $100.00
Population and Development Review ( IF 4.6 ) Pub Date : 2021-05-29 , DOI: 10.1111/padr.12410
David K. Evans ₁

Affiliation

Debates about the value and the ethics of randomized controlled trials (RCTs) in development economics have been active for at least the past 20 years, since a group of prominent economists began publishing the results of RCTs on a range of development issues. Debates about RCTs, both in high-income countries and development settings, have existed for much longer, but the past two decades have seen a marked increase in the production of RCTs in low and middle-income countries and—with them—a host of criticisms. The 2019 award of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel to Abhijit Banerjee, Esther Duflo, and Michael Kremer for their work using experiments to illuminate solutions to global poverty provided official recognition of the work, on the one hand; but on the other, it spurred further critical discussion.

A new volume, Randomized Control Trials in the Field of Development: A Critical Perspective, edited by Florent Bédécarrats, Isabelle Guérin, and François Roubaud, seeks to add to this debate with a collection of 13 studies, along with an introduction by the editors and a set of four interviews (with an Indian policymaker, an Indian government advisor, a French aid official, and a French aid researcher). The editors assemble an array of voices, mostly economists but also medical doctors, water and sanitation specialists, a biostatistician, and others.

Sometimes the volume feels like a true debate. Pritchett (Chapter 2) argues that RCTs distract from a more holistic view of national development in favor of a focus on specific targets (such as “eradicating extreme poverty”). Morduch (Chapter 3) rebuts that “systemic change is not always possible, and sometimes leaves parts of populations behind. Broadening access and service delivery, and expanding the provision of basic goods, remains a fundamental agenda for governments, aid agencies, and foundations.” Morduch also pushes back against the idea that RCTs drove a shift in focus away from macroeconomic growth, providing evidence that a shift towards private goods began two decades earlier. In another instance, Ravallion (Chapter 1) proposes that RCTs “get less critical scrutiny than other methods” whereas Vivalt (Chapter 11) highlights, in a related if not direct response, that RCTs show less evidence of specification searching (i.e., dropping or adding or transforming variables to get a statistically significant result) than other studies. Ogden (Chapter 4) provides a taxonomy of seven classes of RCT critiques, including many of those in other chapters (along with others unmentioned in this volume), and also provides responses to many of them.

In other places, the argument feels less balanced, as in the article-length critique of the 2015 special issue of American Economic Journal: Applied Economics on RCTs evaluating microcredit (Chapter 7). To be fair, the editors state clearly that they invited 10 famous researchers who use RCTs (“randomistas,” in the preferred term of the volume) to participate in the volume, and those researchers declined. Why so? While I do not know the specific motivations, Ogden makes the argument that, while RCT proponents have grown less likely to engage in active debate with critics over the method, the RCT movement has evolved significantly, functionally responding to many of the critiques, with experiments on a wider range of topics, longer timeframes for evaluation, increased use of multiple arms to test alternative mechanisms, and more engagement with policy.

I have published the results of RCTs (as well as quasi-experimental studies and reviews of both RCTs and quasi-experimental studies), and I was tempted to assume a defensive crouch while reading this volume. Many of the critiques throughout are not exclusive to RCTs but apply just as well to quasi-experimental studies and—in some cases—to any empirical research. (Many of the chapter authors explicitly recognize this in their discussions.) As economist Pamela Jakiela put it years ago, “for some reason they keep spelling ‘study’ as R-C-T” (Jakiela 2016, quoted by Ogden in this volume). Here are some examples: misreporting of studies by the media and a failure of authors to correct it (Spears, Ban, and Cumming—Chapter 6), reporting estimates as facts in a popular book based on research (Deaton—Introduction), poorly designed questionnaires and poor reporting of study details (Bédécarrats, Guérin, and Roubaud – Chapter 7), piece-meal and unsustainable solutions with insufficient systemic considerations (interview with Gulzar Natarajan), the fact that “what works” to solve a problem may vary across contexts (Deaton—Introduction), poor choice of outcome variables, or insufficient sample size (Garchitorena et al.—Chapter 5). Yet while these problems are not unique to RCTs, neither are RCTs exempt from them. Hopefully, practitioners of other methods will likewise find inspiration to improve here.

At least two critiques highlighted in the volume do apply principally to RCTs. The first is that RCT advocates claim that RCTs sit at the top of a hierarchy of empirical methods (i.e., they represent a “gold standard”). Deaton, Ravallion, and Heckman each discuss this at length, highlighting that RCTs face their own statistical inference challenges, especially but not limited to when implementation is imperfect (which it usually is), and also that RCTs may be good at identifying an average effect of a treatment, but that often that is not the most policy-relevant statistic. (There is much more, but that's a taste!) While most of the quotes used to establish that RCT practitioners claim pride of place are from well-known advocates (like Banerjee and Duflo), another cited as placing RCTs at the top of a hierarchy is econometrician Guido Imbens, who is not a practitioner of RCTs.

My impression is that much of the concern stems from the concern that “gold standard” language leads some people to believe that, as Ravallion puts it, “RCTs are not just top of the menu of approved methods, nothing else is on the menu!” The extreme version of this position clearly does not apply to the most well-known producers of RCTs. For example, although Bédécarrats, Guérin, and Roubaud define randomistas as “proponents who are convinced that RCTs are the only way to rigorously assess impact in evaluation, and that they are superior to other methodologies in all cases,” all three winners of the Nobel for their experimental work have quasi-experimental and descriptive work. (Banerjee and Duflo, together with Qian, published a quasi-experimental evaluation of road-building just last year!) Yet a form of this does manifest in reviews of the literature (either stand-alone or within empirical papers) that only consider RCT evidence, implicitly or explicitly imposing the assumption that only RCTs deliver impact evidence of value. Spears, Ban, and Cumming (Chapter 6) quote relevant earlier work by Deaton and Cartwright: randomization “does not relieve us of the need to think.”

A second critique that is felt more by RCTs than by observational studies is ethical. Quasi-experimental studies have ethical issues as well—any data collection or even data use may require ethical considerations—but RCTs have the additional ethical challenge of manipulating treatment. (Again, RCTs are not unique in manipulating treatment for the purpose of evaluation, but I would propose that they do it much more commonly than most quasi-experimental approaches.) In their thought-provoking article, Abramowicz and Szafarz (Chapter 10) ask “should economists care about equipoise?” Equipoise is the principle that in advance of the RCT, researchers should be genuinely ignorant as to whether the treatment is beneficial or not. (Or, if an RCT is testing two alternative treatments, researchers should be ignorant as to which is best.) This plays an important role in medical ethics, but development economists leave it largely undiscussed in their work. In their defense, economists may argue that many interventions that advocates support are not actually proven and that RCTs have demonstrated zero effects for interventions that intuition or anecdotal experience suggested would be effective. Yet there are interventions—cash transfers are an easy example, now that hundreds of studies have studied them across many contexts—for which it is difficult to say that the treatment group is not likely to be better off than the control group.

RCT implementers may further defend a departure from equipoise by proposing that rationing will take place anyway in cases where there are insufficient resources to benefit everyone, and that randomizing may be fairer than other allocations. But as Ravallion points out, we often do have some information about who is likely to benefit the most (e.g., the poorest!). Even Ogden, whose article offers the most robust defense of RCTs in the volume, comes up empty on this one: “On the questions of equipoise, as noted above, this remains an area where the RCT movement has yet to significantly engage as best I can tell.” Yet even this may be shifting in the wake of recent controversies around the ethics of certain RCTs. A group of prominent economists, including some whom the editors of this volume would call “randomistas,” have proposed that social science RCTs include ethical discussions, including a discussion of equipoise and, in the case of scarce resources, a rationale for why randomization was better than targeting specific groups for benefits (Asiedu et al. 2021). I suspect that norms will evolve significantly in the coming years in this regard.

The volume includes much of interest that I have not touched on in detail here. Morduch (Chapter 3) highlights how RCTs, even if one is unconvinced of their value for evaluation, are valuable for exploring new types of “economic contracts, behaviors, and institutions.” Vivalt (Chapter 11) explores how incorporating prior beliefs from policymakers can help us learn more from RCTs and other evaluations. Garchitorena et al. (Chapter 5) advocate for including faster moving, nonrandomized implementation research in health delivery, a plea that is echoed in the interview with Indian policymaker Gulzar Natarajan at the end of the volume. On the whole, the volume delivers much of value, even if not all critiques are unique to RCTs.

A final point, raised repeatedly in the volume, is the hopefully obvious fact that RCTs cannot answer all questions and that even those questions that are well answered by an RCT are often best answered in complement with other methods. Questions about economic growth and trade policy are not amenable to randomization, and RCTs by themselves will not yield deep, thick characterizations of health systems and bureaucracies. An RCT will not reveal whether the goals of a program “were worth pursuing in the first place” (Picciotto, Chapter 9). Ultimately, as Spears, Ban, and Cumming put it in their discussion of water and sanitation evaluations (Chapter 6), “there is no gold standard other than careful, thoughtful research.” This standard leaves lots of room for RCTs and a wide range of other tools.

中文翻译：

Florent Bédécarrats、Isabelle Guérin、François Roubaud（编辑）发展领域的随机对照试验：批判性观点牛津大学出版社，2020 年，448 页，100.00 美元

至少在过去的 20 年里，关于发展经济学中随机对照试验 (RCT) 的价值和伦理的争论一直很活跃，因为一群杰出的经济学家开始发表关于一系列发展问题的随机对照试验的结果。在高收入国家和发展环境中，关于 RCT 的争论已经存在了更长时间，但在过去的 20 年中，低收入和中等收入国家的 RCT 的产生显着增加，随之而来的是许多批评。2019 年瑞典央行纪念阿尔弗雷德·诺贝尔经济学奖授予阿比吉特·班纳吉、埃丝特·迪弗洛和迈克尔·克雷默，表彰他们使用实验来阐明全球贫困解决方案的工作，一方面是对这项工作的官方认可；但另一方面，它激发了进一步的批判性讨论。

由 Florent Bédécarrats、Isabelle Guérin 和 François Roubaud 编辑的新卷《发展领域的随机对照试验：批判性视角》试图通过收集 13 项研究以及编辑和一组四次访谈（与一位印度决策者、一位印度政府顾问、一位法国援助官员和一位法国援助研究员）。编辑汇集了一系列声音，主要是经济学家，但也有医生、水和卫生专家、生物统计学家等。

有时，这本书感觉像是一场真正的辩论。Pritchett（第 2 章）认为，随机对照试验分散了对国家发展的更全面观点的注意力，而将注意力集中在特定目标上（例如“消除极端贫困”）。Morduch（第 3 章）反驳说，“系统性变化并不总是可能的，有时会留下部分人口。扩大准入和服务提供，扩大基本商品的供应，仍然是政府、援助机构和基金会的一项基本议程。” Morduch 还反驳了 RCT 推动重点从宏观经济增长转移的观点，提供了证据表明向私人物品的转变早在 20 年前就开始了。在另一个例子中，Ravallion（第 1 章）建议 RCT“比其他方法受到的严格审查更少”，而 Vivalt（第 11 章）强调，在一个相关的（如果不是直接的）响应中，与其他研究相比，RCT 显示出较少的规范搜索证据（即，删除或添加或转换变量以获得统计上显着的结果）。奥格登（第 4 章）提供了七类 RCT 评论的分类，包括其他章节中的许多评论（以及本卷中未提及的其他评论），并且还对其中的许多评论做出了回应。

在其他地方，这种论点感觉不太平衡，如美国经济杂志：应用经济学2015 年特刊的文章长度批评关于评估小额信贷的 RCT（第 7 章）。公平地说，编辑明确表示他们邀请了 10 位使用 RCT 的著名研究人员（“randomistas”，在该卷的首选术语中）参与该卷，这些研究人员拒绝了。为什么这样？虽然我不知道具体的动机，但奥格登认为，虽然 RCT 的支持者越来越不可能与批评者就该方法进行积极的辩论，但 RCT 运动已经有了显着的发展，在功能上回应了许多批评，通过实验更广泛的主题、更长的评估时间、更多地使用多种武器来测试替代机制，以及更多地参与政策。

我已经发表了 RCT 的结果（以及 RCT 和准实验研究的准实验研究和评论），并且在阅读本书时我很想采取防御性蹲伏。贯穿全文的许多批评并不是 RCT 独有的，而是同样适用于准实验研究，在某些情况下也适用于任何实证研究。（许多章节作者在他们的讨论中明确认识到这一点。）正如经济学家 Pamela Jakiela 多年前所说的那样，“出于某种原因，他们一直将 'study' 拼写为 RCT”（Jakiela 2016，在本卷中由奥格登引用）。以下是一些例子：媒体对研究的误报以及作者未能纠正它（Spears、Ban 和 Cumming——第 6 章），在基于研究的流行书中将估计报告为事实（迪顿——引言），设计不佳调查问卷和糟糕的研究细节报告（Bédécarrats、Guérin 和 Roubaud – 第 7 章）、系统考虑不足的零碎和不可持续的解决方案（对 Gulzar Natarajan 的采访）、解决问题的“有效方法”可能因地而异背景（Deaton-Introduction）、结果变量选择不当或样本量不足（Garchitorena 等人-第 5 章）。然而，虽然这些问题并非 RCT 所独有，但 RCT 也不能免除这些问题。希望其他方法的实践者也能在这里找到改进的灵感。

书中强调的至少两个批评确实主要适用于 RCT。第一个是 RCT 倡导者声称 RCT 位于经验方法层次结构的顶端（即，它们代表了“黄金标准”）。Deaton、Ravallion 和 Heckman 都对此进行了详细讨论，强调 RCT 面临着自己的统计推断挑战，尤其是但不限于实施不完善时（通常是这样），而且 RCT 可能擅长识别平均效果治疗，但这通常不是与政策最相关的统计数据。（还有更多，但那是一种品味！）虽然大多数用于证明 RCT 从业者声称自己引以为豪的引述来自知名倡导者（如 Banerjee 和 Duflo），但另一个引述将 RCT 置于最重要的位置。层次结构是计量经济学家吉多·伊本斯，

我的印象是，大部分担忧源于“黄金标准”语言导致一些人相信，正如 Ravallion 所说，“RCT 不仅仅是已批准方法的顶部，菜单上没有其他任何东西！ ” 这种立场的极端版本显然不适用于最知名的 RCT 制作者。例如，尽管 Bédécarrats、Guérin 和 Roubaud 将随机主义者定义为“相信 RCT 是严格评估评估影响的唯一方法，并且在所有情况下都优于其他方法的支持者”，但三位诺贝尔奖获得者因为他们的实验工作有准实验性和描述性的工作。（班纳吉和迪弗洛，和钱一起，在去年就发表了道路建设的准实验评价！) 然而，这种形式的一种形式确实体现在对仅考虑 RCT 证据的文献（独立的或在经验论文中）的评论中，隐含或明确地强加了只有 RCT 提供有价值的影响证据的假设。Spears、Ban 和 Cumming（第 6 章）引用了 Deaton 和 Cartwright 早期的相关工作：随机化“并不能减轻我们思考的需要”。

RCT 比观察性研究更能感受到的第二个批评是伦理问题。准实验研究也存在伦理问题——任何数据收集甚至数据使用都可能需要考虑伦理问题——但 RCT 还面临着操纵治疗的额外伦理挑战。（同样，随机对照试验在以评估为目的操纵治疗方面并不是独一无二的，但我建议它们比大多数准实验方法更常见。）在他们发人深省的文章中，Abramowicz 和 Szafarz（第 10 章）问“经济学家应该关心均衡吗？” 平衡原则是在 RCT 之前，研究人员应该真正不知道治疗是否有益。（或者，如果 RCT 正在测试两种替代疗法，研究人员应该不知道哪种疗法最好。) 这在医学伦理学中起着重要作用，但发展经济学家在他们的工作中基本上没有讨论它。为辩护，经济学家可能会争辩说，许多主张支持的干预措施实际上并未得到证实，并且随机对照试验已经证明，直觉或轶事经验表明的干预措施是有效的，效果为零。然而，也有干预措施——现金转移是一个简单的例子，现在数百项研究已经在许多背景下对它们进行了研究——对于这些干预措施，很难说治疗组不太可能比对照组更好。经济学家可能会争辩说，许多主张支持的干预措施实际上并没有得到证实，而且随机对照试验已经证明，直觉或轶事经验表明的干预措施是有效的，效果为零。然而，也有干预措施——现金转移是一个简单的例子，现在数百项研究已经在许多背景下对它们进行了研究——对于这些干预措施，很难说治疗组不太可能比对照组更好。经济学家可能会争辩说，许多主张支持的干预措施实际上并没有得到证实，而且随机对照试验已经证明，直觉或轶事经验表明的干预措施是有效的，效果为零。然而，也有干预措施——现金转移是一个简单的例子，现在数百项研究已经在许多背景下对它们进行了研究——对于这些干预措施，很难说治疗组不太可能比对照组更好。

RCT 实施者可能会通过提议在没有足够的资源使每个人受益的情况下无论如何都会进行配给，并且随机化可能比其他分配更公平，从而进一步捍卫偏离均衡。但正如 Ravallion 指出的那样，我们确实经常掌握一些关于谁可能受益最多的信息（例如，最贫穷的人！）。甚至 Ogden 的文章在该卷中为 RCT 提供了最有力的辩护，但在这一点上却是空洞的：“在平衡问题上，如上所述，这仍然是 RCT 运动尚未发挥最大作用的领域，我可以说。” 然而，随着最近围绕某些 RCT 伦理的争议，即使这种情况也可能发生变化。一群杰出的经济学家，其中一些人被本卷的编辑称为“随机主义者，2021 年）。我怀疑在未来几年，这方面的规范会发生重大变化。

该卷包括许多我没有在这里详细涉及的兴趣。Morduch（第 3 章）强调了 RCT 如何对探索新型“经济契约、行为和制度”有价值，即使人们不相信它们的评估价值。Vivalt（第 11 章）探讨了如何结合决策者的先前信念可以帮助我们从 RCT 和其他评估中了解更多信息。加奇托雷纳等人。（第 5 章）提倡在卫生服务中纳入更快速、非随机的实施研究，这一请求在本书结尾处对印度政策制定者 Gulzar Natarajan 的采访中得到了回应。总的来说，该卷提供了很多价值，即使并非所有批评都是 RCT 独有的。

本书中反复提到的最后一点是一个显而易见的事实，即 RCT 无法回答所有问题，即使是那些由 RCT 很好回答的问题，通常最好与其他方法相辅相成。有关经济增长和贸易政策的问题不适合随机化，而且 RCT 本身不会对卫生系统和官僚机构产生深入、深入的描述。RCT 不会揭示项目的目标“首先是否值得追求”（Picciotto，第 9 章）。最终，正如 Spears、Ban 和 Cumming 在他们关于水和卫生评估的讨论（第 6 章）中所说的那样，“除了仔细、深思熟虑的研究之外，没有黄金标准。” 该标准为 RCT 和范围广泛的其他工具留下了很大的空间。

更新日期：2021-07-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文