Curriculum Reform in The Common Core Era: Evaluating Elementary Math Textbooks Across Six U.S. States,Journal of Policy Analysis and Management

当前位置： X-MOL 学术 › J. Policy Anal. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Curriculum Reform in The Common Core Era: Evaluating Elementary Math Textbooks Across Six U.S. States
Journal of Policy Analysis and Management ( IF 2.3 ) Pub Date : 2020-09-01 , DOI: 10.1002/pam.22257
David Blazar , Blake Heller , Thomas J. Kane , Morgan Polikoff , Douglas O. Staiger , Scott Carrell , Dan Goldhaber , Douglas N. Harris , Rachel Hitch , Kristian L. Holden , Michal Kurlaender

Can a school or district improve student achievement simply by switching to a higherquality textbook or curriculum? We conducted the first multi-textbook, multi-state effort to estimate textbook efficacy following widespread adoption of the Common Core State Standards (CCSS) and associated changes in the textbook market. Pooling textbook adoption and student test score data across six geographically and demographically diverse U.S. states, we found little evidence of differences in average achievement gains for schools using different math textbooks. We found some evidence of greater variation in achievement gains among schools using pre-CCSS editions, which may have been more varied in their content than post-CCSS editions because they were written for a broader set of standards. We also found greater variation among schools that had more exposure to a given text. However, these differences were small. Despite considerable interest and attention to textbooks as a low-cost, “silver bullet” intervention for improving student outcomes, we conclude that the adoption of a new textbook or set of curriculum materials, on its own, is unlikely to achieve this goal. © 2020 by the Association for Public Policy Analysis and Management INTRODUCTION The choice of textbook or curriculum is an enticing lever for improving student outcomes. Few central office decisions have such far-ranging implications for the work that students and teachers do together in classrooms every day. In our own survey, we found that teachers in 94 percent of elementary schools in six geographically and demographically diverse U.S. states reported using the official district-adopted textbook or curriculum in more than half of their lessons.1 Given such widespread 1 Throughout the paper, we use the terms “textbook” and “curriculum” interchangeably. We recognize, though, that the physical textbook may be just one of multiple materials that make up a given curricuJournal of Policy Analysis and Management, Vol. 00, No. 0, 1–44 (2020) © 2020 by the Association for Public Policy Analysis and Management Published by Wiley Periodicals, Inc. View this article online at wileyonlinelibrary.com/journal/pam DOI:10.1002/pam.22257 2 / Curriculum Reform in The Common Core Era usage, helping schools and districts switch from less to more effective materials offers a large potential return on investment (Kirst, 1982; Whitehurst, 2009). As Chingos andWhitehurst (2012) point out, “...whereas improving teacher quality...is challenging, expensive, and time consuming, making better choices among available instructional materials should be relatively easy, inexpensive, and quick” (p. 1). Textbook choice has been especially salient and has gained national policy attention in recent years after many states adopted the Common Core State Standards (CCSS), which generally are considered to be more rigorous than prior state standards (Friedberg et al., 2018).2 Curriculum reform is one of the primarymechanisms by which policymakers, practitioners, and researchers hypothesized that the introduction of the CCSS could improve student outcomes at scale (Carmichael et al., 2010; Porter et al., 2011). In the years since CCSS adoption, large publishing houses (e.g., Houghton Mifflin Harcourt, McGraw Hill, Pearson) have invested heavily in adapting existing textbooks and curriculummaterials to new standards, and in writing newmaterials from scratch. New York State spent over $35 million dollars to develop a set of curriculum materials, Engage NY, which are now widely used across the country under this title and Eureka (Cavanaugh, 2015). Once new textbooks are written, the marginal cost to schools and districts of switching from one textbook to another is quite small. On average, elementary math textbooks cost roughly $35 per student, which represents less than 1 percent of per-pupil expenditures (Boser, Chingos, & Straus, 2015). As of 2017, over 80 percent of the schools in our sample had adopted a CCSS-edition textbook in elementary math. Despite the potential value to districts and schools, the research literature on the efficacy of alternative textbooks or curricula is sparse. We are aware of one multitextbook randomized trial (Agodini et al., 2010), two randomized trials assessing the effectiveness of a single textbook (Eddy et al., 2014; Jaciw et al., 2016), and a handful of non-experimental studies that rely on matching techniques to estimate textbook effects (Bhatt & Koedel, 2012; Bhatt, Koedel, & Lehmann, 2013; Koedel et al., 2017). However, most of the textbook editions or curricula materials in common use today have never been subjected to a rigorous test of efficacy. Further, no studies have examined the sensitivity of textbook effects across time or across states. Although some textbook editions are written for local markets (e.g., California, Texas), logic suggests that a high-quality curriculum or textbook should be effective across settings, especially when the materials are written to align with a common or similar set of standards. Yet, to our knowledge, no studies have assessed this claim empirically. Of the studies cited above, most are analyses of single districts or states. Two studies (Agodini et al., 2010; Eddy et al., 2014) recruited participants across states; but, schools and districts volunteered and so are not representative of those settings, let alone of U.S. states more broadly. One reason for the weakness of the evidence base is the historic diversity in state standards and assessments.When each state had its own standards and assessments, single-state studies were relevant only for schools in a given state, and few states were sufficiently large to justify the cost of such an analysis. A second, more practical barrier has been the omission of textbook adoptions from state data collection lum. Curricula can include student and teacher editions of the textbook, formative assessment materials, manipulative sets, etc. In our survey to schools and teachers, we referred to the “primary textbook or curriculum materials” used by teachers, which could consist of “a printed textbook from a publisher, an online text, or a collection of materials assembled by the school, district, or individual teachers [but] does not include supplemental resources that individual teachers may use from time to time to supplement the curriculum materials.” 2 Since 2010, many of the states that initially adopted the CCSS have since revised their standards. Yet, several of the states that revised standards from the CCSS have landed on a close facsimile (Friedberg et al., 2018). Journal of Policy Analysis and Management DOI: 10.1002/pam Published on behalf of the Association for Public Policy Analysis and Management Curriculum Reform in The Common Core Era / 3 efforts (Polikoff, 2018). As useful as textbook adoption data would be for estimating efficacy, states have concentrated their data collection efforts on fulfilling federal accountability requirements that focus on student test score performance rather than informing district purchasing decisions. States typically have stayed away from collecting data on curricula adoptions in deference to local authorities (Hutt & Polikoff, 2018). We are aware of only a handful of states that regularly collect information on the textbooks used by schools.3 As a result, it has been difficult to bring to bear states’ longitudinal data on student achievement to compare the achievement gains of similar schools using different curricula. We designed our study as a field test of a replicable, low-cost approach to measuring curriculum efficacy. Although experiments may be the most convincing way to estimate the causal effect of textbooks, the estimated impacts might not generalize beyond the small subset of schools that are willing to have their textbooks randomly assigned. Instead, by combining publicly available administrative data on textbooks in two states (California and New Mexico) with a survey administered to a random sample of schools in four additional states (Louisiana, Maryland, New Jersey, and Washington state), we ensured that we had state-representative samples of schools and were studying the textbook editions that schools are using in the current CCSS era. By pairing textbook adoption data with student test-score data on state-administered CCSS-aligned assessments, we also eliminated the need to collect our own assessments. Further, coordination amongst a large set of researchers across states—with different teams estimating the same “value-added” model with student-level data and then sharing only aggregated data—reduced the need to share sensitive data across state lines. In short, our methodology could be used to update results as textbook editions come and go. Although our value-added methodology was the only way to examine textbook efficacy at scale, it comes with a trade-off with regard to internal validity. We estimate the association between textbook adoption and aggregate student achievement gains, and so cannot account for all factors that led schools and districts to choose different textbooks (e.g., characteristics of school leadership). However, we were able to examine a model of textbook selection based on observable school and district characteristics. These models indicate that selection varies substantially across textbooks and states, and so is unlikely to lead to systematic bias. Intuition also suggests that it is highly unlikely that highor low-growth schools are systematically choosing the most effective textbooks when they (and we) do not know which textbooks those are. Empirically, we found that results were robust to models with different sets of schooland district-level controls, as well as models that replaced observable characteristics with school fixed effects that account for fixed differences across schools. Unlike the prior literature, we found little evidence of substantial differences in averagemath ach

中文翻译：

共同核心时代的课程改革：评估美国六个州的初等数学教科书

学校或学区能否仅通过改用更高质量的教科书或课程来提高学生成绩？在普遍采用共同核心州标准 (CCSS) 以及教科书市场的相关变化之后，我们进行了第一次多教科书、多州努力来评估教科书的功效。汇集美国六个地理和人口多样化州的教科书采用率和学生考试成绩数据，我们发现几乎没有证据表明学校使用不同数学教科书的平均成绩增长存在差异。我们发现一些证据表明，使用 CCSS 之前版本的学校在成就增益方面存在更大差异，这些版本的内容可能比 CCSS 之后版本更加多样化，因为它们是针对更广泛的标准编写的。我们还发现，接触特定文本的学校之间的差异更大。然而，这些差异很小。尽管人们对教科书作为提高学生成绩的低成本“银弹”干预有相当大的兴趣和关注，但我们得出的结论是，采用新教科书或一套课程材料本身不太可能实现这一目标。© 2020 by the Association for Public Policy Analysis and Management 介绍选择教科书或课程是提高学生成绩的诱人手段。很少有中央办公室的决定对学生和教师每天在课堂上一起完成的工作产生如此广泛的影响。在我们自己的调查中，我们发现美国 6 个地理和人口多样化的小学 94% 的教师各州报告在其一半以上的课程中使用官方学区采用的教科书或课程。1 鉴于如此广泛的 1 在整篇论文中，我们交替使用术语“教科书”和“课程”。然而，我们认识到，物理教科书可能只是构成特定课程的多种材料之一，《政策分析与管理杂志》，卷。00, No. 0, 1–44 (2020) © 2020 公共政策分析与管理协会，Wiley Periodicals, Inc. 出版。在线查看本文：wileyonlinelibrary.com/journal/pam DOI:10.1002/pam.22257 2 / 使用共同核心时代的课程改革，帮助学校和地区从较少有效的材料转向更有效的材料，提供了巨大的潜在投资回报（Kirst，1982 年；Whitehurst，2009 年）。正如 Chingos 和 Whitehurst (2012) 指出的那样，“...... 虽然提高教师质量……具有挑战性、成本高昂且耗时，但在可用的教学材料中做出更好的选择应该相对容易、便宜且快速”（第 1 页）。近年来，在许多州采用共同核心州标准 (CCSS) 后，教材选择尤为突出，并引起了国家政策的关注，该标准通常被认为比之前的州标准更为严格（Friedberg 等，2018）。2课程改革是政策制定者、实践者和研究人员假设引入 CCSS 可以大规模提高学生成绩的主要机制之一（Carmichael 等，2010；Porter 等，2011）。在 CCSS 采用后的几年里，大型出版社（例如 Houghton Mifflin Harcourt、McGraw Hill、Pearson）已投入巨资使现有教科书和课程材料适应新标准，并从头开始编写新材料。纽约州花费了超过 3500 万美元来开发一套课程材料 Engage NY，现在在这个名称和 Eureka 下在全国广泛使用（Cavanaugh，2015）。一旦编写了新教科书，学校和地区从一种教科书转换到另一种教科书的边际成本非常小。平均而言，小学数学教科书的成本约为每位学生 35 美元，不到每位学生支出的 1%（Boser、Chingos 和 Straus，2015 年）。截至 2017 年，我们样本中超过 80% 的学校采用了 CCSS 版的小学数学教科书。尽管对地区和学校具有潜在价值，关于替代教科书或课程的功效的研究文献很少。我们知道一项多教科书随机试验（Agodini 等人，2010 年）、两项评估单一教科书有效性的随机试验（Eddy 等人，2014 年；Jaciw 等人，2016 年），以及少数非实验性研究依赖匹配技术来估计教科书效果的研究（Bhatt & Koedel, 2012; Bhatt, Koedel, & Lehmann, 2013; Koedel et al., 2017）。然而，当今普遍使用的大多数教科书版本或课程材料从未经过严格的有效性测试。此外，没有研究检查过教科书效应跨时间或跨州的敏感性。尽管一些教科书版本是为当地市场（例如加利福尼亚州、德克萨斯州）编写的，逻辑表明，高质量的课程或教科书应该在各种环境中都有效，尤其是当材料的编写与一组通用或相似的标准保持一致时。然而，据我们所知，还没有研究根据经验评估过这种说法。在上面引用的研究中，大多数是对单个地区或州的分析。两项研究（Agodini 等人，2010 年；Eddy 等人，2014 年）在各州招募了参与者；但是，学校和地区自愿参加，因此不能代表这些环境，更不用说更广泛地代表美国各州了。证据基础薄弱的一个原因是州标准和评估的历史多样性。当每个州都有自己的标准和评估时，单一州研究仅与特定州的学校相关，很少有国家大到足以证明这种分析的成本是合理的。第二个更实际的障碍是州数据收集部门没有采用教科书。课程可以包括教科书的学生版和教师版、形成性评估材料、操作集等。在我们对学校和教师的调查中，我们提到了教师使用的“初级教科书或课程材料”，它可以包括“印刷来自出版商的教科书、在线文本或由学校、学区或个别教师收集的材料集合 [但] 不包括个别教师可能不时用来补充课程材料的补充资源。” 2 自 2010 年以来，许多最初采用 CCSS 的州已修订了其标准。然而，从 CCSS 修订标准的几个州已经接近传真（Friedberg 等人，2018 年）。Journal of Policy Analysis and Management DOI: 10.1002/pam 代表公共核心时代公共政策分析和管理课程改革协会出版 / 3 努力 (Polikoff, 2018)。与教科书采用数据对于估计功效一样有用，各州已将其数据收集工作集中在履行联邦问责制要求上，这些要求侧重于学生考试成绩，而不是告知地区购买决策。各州通常会尊重地方当局的要求，不收集有关课程采用的数据（Hutt & Polikoff，2018 年）。我们知道只有少数几个州定期收集有关学校使用的教科书的信息。3 因此，很难将各州关于学生成绩的纵向数据用于比较使用不同课程的类似学校的成绩增益。我们将我们的研究设计为对衡量课程效果的可复制、低成本方法的实地测试。尽管实验可能是估计教科书因果效应的最有说服力的方法，但估计的影响可能无法推广到愿意随机分配教科书的一小部分学校之外。相反，通过将两个州（加利福尼亚州和新墨西哥州）的公开课本管理数据与另外四个州（路易斯安那州、马里兰州、新泽西州和华盛顿州）的随机学校样本调查相结合，我们确保我们拥有具有国家代表性的学校样本，并且正在研究学校在当前 CCSS 时代使用的教科书版本。通过将教科书采用数据与国家管理的 CCSS 对齐评估的学生考试分数数据配对，我们还消除了收集自己的评估的需要。此外，跨州的大量研究人员之间的协调——不同的团队用学生级别的数据估计相同的“增值”模型，然后仅共享聚合数据——减少了跨州共享敏感数据的需要。简而言之，我们的方法可用于随着教科书版本的更新而更新结果。尽管我们的增值方法是大规模检查教科书有效性的唯一方法，但它需要权衡内部有效性。我们估计教科书采用与学生总成绩增长之间的关联，因此无法解释导致学校和地区选择不同教科书的所有因素（例如，学校领导的特征）。但是，我们能够根据可观察到的学校和地区特征来检查教科书选择模型。这些模型表明，不同教科书和州的选择差异很大，因此不太可能导致系统性偏差。直觉还表明，高增长或低增长的学校不太可能在他们（和我们）不知道哪些是最有效的教科书时系统地选择最有效的教科书。根据经验，我们发现结果对于具有不同学校和地区级控制集的模型具有鲁棒性，以及用学校固定效应代替可观察特征的模型，这些固定效应解释了学校之间的固定差异。与之前的文献不同，我们几乎没有发现平均数学的显着差异的证据。

更新日期：2020-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文