-
Examining Three Learning Progressions in Middle-school Mathematics for Formative Assessment Applied Measurement in Education (IF 1.04) Pub Date : 2021-03-08 Duy N. Pham, Craig S. Wells, Malcolm I. Bauer, E. Caroline Wylie, Scott Monroe
ABSTRACT Assessments built on a theory of learning progressions are promising formative tools to support learning and teaching. The quality and usefulness of those assessments depend, in large part, on the validity of the theory-informed inferences about student learning made from the assessment results. In this study, we introduced an approach to address an important challenge related to examining
-
A Method for Identifying Partial Test-Taking Engagement Applied Measurement in Education (IF 1.04) Pub Date : 2021-03-01 Steven Wise, Megan Kuhfeld
ABSTRACT Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not affected by disengagement. A recent study provided
-
Improving Test-Taking Effort in Low-Stakes Group-Based Educational Testing: A Meta-Analysis of Interventions Applied Measurement in Education (IF 1.04) Pub Date : 2021-03-01 Joseph Rios
ABSTRACT Four decades of research have shown that students’ low test-taking effort is a serious threat to the validity of score-based inferences from low-stakes, group-based educational assessments. This meta-analysis sought to identify effective interventions for improving students’ test-taking effort in such contexts. Included studies (a) used a treatment-control group design; (b) administered a
-
Reviewing the Test Reviews: Quality Judgments and Reviewer Agreements in the Mental Measurements Yearbook Applied Measurement in Education (IF 1.04) Pub Date : 2021-02-25 Thomas Hogan, Marissa DeStefano, Caitlin Gilby, Dana Kosman, Joshua Peri
ABSTRACT Buros’ Mental Measurements Yearbook (MMY) has provided professional reviews of commercially published psychological and educational tests for over 80 years. It serves as a kind of conscience for the testing industry. For a random sample of 50 entries in the 19th MMY (a total of 100 separate reviews) this study determined the level of qualitative judgment rendered by reviewers and the consistency
-
Think Alouds: Informing Scholarship and Broadening Partnerships through Assessment Applied Measurement in Education (IF 1.04) Pub Date : 2021-02-01 Jonathan David Bostic
ABSTRACT Think alouds are valuable tools for academicians, test developers, and practitioners as they provide a unique window into a respondent’s thinking during an assessment. The purpose of this special issue is to highlight novel ways to use think alouds as a means to gather evidence about respondents’ thinking. An intended outcome from this special issue is that readers may better understand think
-
Using Think-Alouds for Response Process Evidence of Teacher Attentiveness Applied Measurement in Education (IF 1.04) Pub Date : 2021-02-01 Ya Mo, Michele Carney, Laurie Cavey, Tatia Totorica
ABSTRACT There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop selected-response assessment items. Through analyses of
-
Formative Assessment of Computational Thinking: Cognitive and Metacognitive Processes Applied Measurement in Education (IF 1.04) Pub Date : 2021-02-01 Sarah Bonner, Peggy Chen, Kristi Jones, Brandon Milonovich
ABSTRACT We describe the use of think alouds to examine substantive processes involved in performance on a formative assessment of computational thinking (CT) designed to support self-regulated learning (SRL). Our task design model included three phases of work on a computational thinking problem: forethought, performance, and reflection. The cognitive processes of seven students who reported their
-
Gathering Response Process Data for a Problem-Solving Measure through Whole-Class Think Alouds Applied Measurement in Education (IF 1.04) Pub Date : 2021-02-01 Jonathan David Bostic, Toni A. Sondergeld, Gabriel Matney, Gregory Stone, Tiara Hicks
ABSTRACT Response process validity evidence provides a window into a respondent’s cognitive processing. The purpose of this study is to describe a new data collection tool called a whole-class think aloud (WCTA). This work is performed as part of test development for a series of problem-solving measures to be used in elementary and middle grades. Data from third-grade students were collected in a 1–1
-
Rethinking Think-Alouds: The Often-Problematic Collection of Response Process Data Applied Measurement in Education (IF 1.04) Pub Date : 2021-02-01 Jacqueline P. Leighton
ABSTRACT The objective of this paper is to comment on the think-aloud methods presented in the three papers included in this special issue. The commentary offered stems from the author’s own psychological investigations of unobservable information processes and the conditions under which the most defensible claims can be advanced. The structure of this commentary is as follows: First, the objective
-
Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model Applied Measurement in Education (IF 1.04) Pub Date : 2020-08-25 Zhonghua Zhang
ABSTRACT The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the mathematical formulas for computing the asymptotic standard
-
Can Culture Be a Salient Predictor of Test-Taking Engagement? An Analysis of Differential Noneffortful Responding on an International College-Level Assessment of Critical Thinking Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-29 Joseph A. Rios, Hongwen Guo
ABSTRACT The objective of this study was to evaluate whether differential noneffortful responding (identified via response latencies) was present in four countries administered a low-stakes college-level critical thinking assessment. Results indicated significant differences (as large as .90 SD) between nearly all country pairings in the average number of noneffortful responses per test taker. Furthermore
-
On the Reliable Identification and Effectiveness of Computer-Based, Pop-Up Glossaries in Large-Scale Assessments Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-27 Dale J. Cohen, Alesha Ballman, Frank Rijmen, Jon Cohen
ABSTRACT Computer-based, pop-up glossaries are perhaps the most promising accommodation aimed at mitigating the influence of linguistic structure and cultural bias on the performance of English Learner (EL) students on statewide assessments. To date, there is no established procedure for identifying the words that require a glossary for EL students that is sufficiently reliable. In the coding procedure
-
Applying a Multiple Comparison Control to IRT Item-fit Testing Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-23 Derek Sauder, Christine DeMars
ABSTRACT We used simulation techniques to assess the item-level and familywise Type I error control and power of an IRT item-fit statistic, the S-X2 . Previous research indicated that the S-X2 has good Type I error control and decent power, but no previous research examined familywise Type I error control. We varied percentage of misfitting items, sample size, and test length, and computed familywise
-
The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Cathy Wendler, Nancy Glazer, Brent Bridgeman
ABSTRACT Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups (slow, medium, and fast) and two conditions
-
The Impact of Operational Scoring Experience and Additional Mentored Training on Raters’ Essay Scoring Accuracy Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Ikkyu Choi, Edward W. Wolfe
ABSTRACT Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what long-term effects training can have. In this study
-
Evaluating Human Scoring Using Generalizability Theory Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Yaw Bimpeh, William Pointer, Ben Alexander Smith, Liz Harrison
ABSTRACT Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we apply generalizability theory (G theory) to
-
Understanding and Interpreting Human Scoring Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Nancy Glazer, Edward W. Wolfe
ABSTRACT This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed response scoring process. That framework identifies
-
Applying Cognitive Theory to the Human Essay Rating Process Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Bridgid Finn, Burcu Arslan, Matthew Walsh
ABSTRACT To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters’ scoring accuracy on TOEFL and GRE essays. We used binomial linear mixed effect models
-
Commentary on “Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance” Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Walter D. Way
(2020). Commentary on “Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance”. Applied Measurement in Education: Vol. 33, Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance, pp. 255-261.
-
Why Should We Care about Human Raters? Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Edward W. Wolfe, Cathy LW Wendler
(2020). Why Should We Care about Human Raters? Applied Measurement in Education: Vol. 33, Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance, pp. 189-190.
-
Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-21 Isaac I. Bejar, Chen Li, Daniel McCaffrey
ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater
-
Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-20 Xiaodan Tang, George Karabatsos, Haiqin Chen
ABSTRACT In applications of item response theory (IRT) models, it is known that empirical violations of the local independence (LI) assumption can significantly bias parameter estimates. To address this issue, we propose a threshold-autoregressive item response theory (TAR-IRT) model that additionally accounts for order dependence among the item responses of each examinee. The TAR-IRT approach also
-
Validating Rubric Scoring Processes: An Application of an Item Response Tree Model Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-20 Aaron J. Myers, Allison J. Ames, Brian C. Leventhal, Madison A. Holzman
ABSTRACT When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters’ scoring processes and the intended scoring processes may lead
-
An IRT Mixture Model for Rating Scale Confusion Associated with Negatively Worded Items in Measures of Social-Emotional Learning Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-16 Daniel Bolt, Yang Caroline Wang, Robert H. Meyer, Libby Pier
ABSTRACT We illustrate the application of mixture IRT models to evaluate respondent confusion due to the negative wording of certain items on a social-emotional learning (SEL) assessment. Using actual student self-report ratings on four social-emotional learning scales collected from students in grades 3–12 from CORE Districts in the state of California, we also evaluate the consequences of the potential
-
Evaluating Random and Systematic Error in Student Growth Percentiles Applied Measurement in Education (IF 1.04) Pub Date : 2020-07-15 Craig S. Wells, Stephen G. Sireci
ABSTRACT Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in SGPs by simulating test scores for four grades
-
The Impact of Test-Taking Disengagement on Item Content Representation Applied Measurement in Education (IF 1.04) Pub Date : 2020-03-03 Steven L. Wise
ABSTRACT In achievement testing there is typically a practical requirement that the set of items administered should be representative of some target content domain. This is accomplished by establishing test blueprints specifying the content constraints to be followed when selecting the items for a test. Sometimes, however, students give disengaged responses to some of their test items, which raises
-
The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments Applied Measurement in Education (IF 1.04) Pub Date : 2020-03-03 Yasmine H. El Masri, David Andrich
ABSTRACT In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item functioning (DIF) of items post hoc. This typically requires
-
Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods Applied Measurement in Education (IF 1.04) Pub Date : 2020-03-03 Adam E. Wyse
ABSTRACT This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists’ judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores. The second variation uses Angoff ratings to
-
Gauging Uncertainty in Test-to-Curriculum Alignment Indices Applied Measurement in Education (IF 1.04) Pub Date : 2020-03-03 Anne Traynor, Tingxuan Li, Shuqi Zhou
ABSTRACT During the development of large-scale school achievement tests, panels of independent subject-matter experts use systematic judgmental methods to rate the correspondence between a given test’s items and performance objective statements. The individual experts’ ratings may then be used to compute summary indices to quantify the match between a given test and its target item domain. The magnitude
-
Rasch Model Extensions for Enhanced Formative Assessments in MOOCs Applied Measurement in Education (IF 1.04) Pub Date : 2020-03-03 Dmitry Abbakumov, Piet Desmet, Wim Van den Noortgate
ABSTRACT Formative assessments are an important component of massive open online courses (MOOCs), online courses with open access and unlimited student participation. Accurate conclusions on students’ proficiency via formative, however, face several challenges: (a) students are typically allowed to make several attempts; and (b) student performance might be affected by other variables, such as interest
-
Subscore Equating and Profile Reporting Applied Measurement in Education (IF 1.04) Pub Date : 2020-03-03 Euijin Lim, Won-Chan Lee
ABSTRACT The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in difficulty, and sample size. The results indicated
-
The Effectiveness and Features of Formative Assessment in US K-12 Education: A Systematic Review Applied Measurement in Education (IF 1.04) Pub Date : 2020-03-02 Hansol Lee, Huy Q. Chung, Yu Zhang, Jamal Abedi, Mark Warschauer
ABSTRACT In the present article, we present a systematical review of previous empirical studies that conducted formative assessment interventions to improve student learning. Previous meta-analysis research on the overall effects of formative assessment on student learning has been conclusive, but little has been studied on important features of formative assessment interventions and their differential
-
Investigating Repeater Effects on Small Sample Equating: Include or Exclude? Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Hongyu Diao, Lisa Keller
ABSTRACT Examinees who attempt the same test multiple times are often referred to as “repeaters.” Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a non-equivalent anchor test (NEAT) design. However, under
-
Effect of Sample Size on Common Item Equating Using the Dichotomous Rasch Model Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Thomas R. O’Neill, Justin L. Gregg, Michael R. Peabody
ABSTRACT This study addresses equating issues with varying sample sizes using the Rasch model by examining how sample size affects the stability of item calibrations and person ability estimates. A resampling design was used to create 9 sample size conditions (200, 100, 50, 45, 40, 35, 30, 25, and 20), each replicated 10 times. Items were recalibrated using each of these 90 samples. The deviation of
-
Practical Issues in Linking and Equating with Small Samples Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Michael R. Peabody
(2020). Practical Issues in Linking and Equating with Small Samples. Applied Measurement in Education: Vol. 33, Practical Issues in Linking and Equating with Small Sample, pp. 1-2.
-
A Practitioner’s Review of Design Decisions Related to Equating and Linking with Small Samples (Commentary) Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Amin Saiar
ABSTRACT A commentary on the articles presented in this issue is provided that focuses on a practitioner’s perspective on equating and linking with small samples. A review of each article’s design and outcomes is followed by the practical implications when implementing the researched study elements and the guidance offered by the authors. Inter-relationships among the variables of interest presented
-
Shade Tree Psychometrics: Tools and Strategies for Linking under Suboptimal Conditions (Commentary) Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Mark R. Raymond
(2020). Shade Tree Psychometrics: Tools and Strategies for Linking under Suboptimal Conditions (Commentary) Applied Measurement in Education: Vol. 33, Practical Issues in Linking and Equating with Small Sample, pp. 67-72.
-
Equating with Small Samples (Commentary) Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Michael J. Kolen
ABSTRACT My comments focus on general considerations in equating with small samples. I begin with a brief discussion of equating designs and consider issues associated with equating error, including its random and systematic components. I relate these issues to the development of sample size guidelines for equating and discuss the importance of using different types of research designs when investigating
-
Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Jason P. Kopp, Andrew T. Jones
ABSTRACT Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been previously investigated, but not under small sample
-
Equating with Small and Unbalanced Samples Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Joshua T. Goodman, Andrew D. Dallas, Fen Fan
ABSTRACT Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students, options are scarcer. This simulation study
-
Some Methods and Evaluation for Linking and Equating with Small Samples Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Michael R. Peabody
ABSTRACT The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required for equating precedes the discussion of the equating
-
Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples Applied Measurement in Education (IF 1.04) Pub Date : 2020-02-18 Robert T. Furter, Andrew C. Dwyer
ABSTRACT Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal weights mean equating slightly outperformed Rasch
-
Negative Keying Effects in the Factor Structure of TIMSS 2011 Motivation Scales and Associations with Reading Achievement Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 Michalis P. Michaelides
ABSTRACT The Student Background survey administered along with achievement tests in studies of the International Association for the Evaluation of Educational Achievement includes scales of student motivation, competence, and attitudes toward mathematics and science. The scales consist of positively- and negatively keyed items. The current research examined the factorial structure of the 18-item motivational
-
Using Constrained Optimization to Increase the Representation of Students from Low-Income Neighborhoods Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 Rebecca Zwick, Lei Ye, Steven Isham
ABSTRACT In US colleges, the scarcity of students from low-income families is a major concern. We present a novel way of boosting the percentage of qualified low-income students using constrained optimization (CO), an operations research technique. CO allows incorporation of both academic requirements and diversity goals in college admissions. The incoming class’s academic credentials are maximized
-
Differential Item Functioning for Accommodated Students with Disabilities: Effect of Differences in Proficiency Distributions Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 Sarah Quesen, Suzanne Lane
ABSTRACT This study examined the effect of similar vs. dissimilar proficiency distributions on uniform DIF detection on a statewide eighth grade mathematics assessment. Results from the similar- and dissimilar-ability reference groups with an SWD focal group were compared for four models: logistic regression, hierarchical generalized linear model (HGLM), the Wald-1 IRT-based test, and the Mantel-Haenszel
-
Are Multiple-choice Items Too Fat? Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 Thomas M. Haladyna, Michael C. Rodriguez, Craig Stevens
ABSTRACT The evidence is mounting regarding the guidance to employ more three-option multiple-choice items. From theoretical analyses, empirical results, and practical considerations, such items are of equal or higher quality than four- or five-option items, and more items can be administered to improve content coverage. This study looks at 58 tests, including state achievement, college readiness,
-
An Information-Based Approach to Identifying Rapid-Guessing Thresholds Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 Steven L. Wise
ABSTRACT The identification of rapid guessing is important to promote the validity of achievement test scores, particularly with low-stakes tests. Effective methods for identifying rapid guesses require reliable threshold methods that are also aligned with test taker behavior. Although several common threshold methods are based on rapid guessing response accuracy or visual inspection of response time
-
Application of IRT Fixed Parameter Calibration to Multiple-Group Test Data Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 Seonghoon Kim, Michael J. Kolen
ABSTRACT In applications of item response theory (IRT), fixed parameter calibration (FPC) has been used to estimate the item parameters of a new test form on the existing ability scale of an item pool. The present paper presents an application of FPC to multiple examinee groups test data that are linked to the item pool via anchor items, and investigates the performance of FPC relative to an alternative
-
Effects of Item Modifications on Test Accessibility for Persistently Low-Performing Students with Disabilities Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 Dale J. Cohen, Jin Zhang, Werner Wothke
ABSTRACT Construct-irrelevant cognitive complexity of some items in the statewide grade-level assessments may impose performance barriers for students with disabilities who are ineligible for alternate assessments based on alternate achievement standards. This has spurred research into whether items can be modified to reduce complexity without affecting item construct. This study uses a generalized
-
Measuring the Reliability of Diagnostic Mastery Classifications at Multiple Levels of Reporting Applied Measurement in Education (IF 1.04) Pub Date : 2019-09-26 W. Jake Thompson, Amy K. Clark, Brooke Nash
ABSTRACT As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an operational, large-scale, diagnostic assessment system
-
Deconstructing CHC Applied Measurement in Education (IF 1.04) Pub Date : 2019-06-17 John D. Wasserman
ABSTRACT Twenty-five years after the introduction of Carroll’s (1993) Three Stratum (3S) theory of intelligence and McGrew’s (1997) subsequent synthesis of 3S with the extended Gf-Gc / Horn-Cattell theory, the Cattell-Horn-Carroll (CHC) theory represents the prevailing framework by which the structure of human cognitive and intellectual abilities is understood. This commentary reviews, discusses, and
-
Critically Reflecting on the Origins, Evolution, and Impact of the Cattell-Horn-Carroll (CHC) Model Applied Measurement in Education (IF 1.04) Pub Date : 2019-06-17 Ryan J. McGill, Stefan C. Dombrowski
ABSTRACT The Cattell-Horn-Carroll (CHC) model presently serves as a blueprint for both test development and a taxonomy for clinical interpretation of modern tests of cognitive ability. Accordingly, the trend among test publishers has been toward creating tests that provide users with an ever-increasing array of scores that comport with CHC. However, an accumulating body of independent research on modern
-
The One and the Many: Enduring Legacies of Spearman and Thurstone on Intelligence Test Score Interpretation Applied Measurement in Education (IF 1.04) Pub Date : 2019-06-17 A. Alexander Beaujean, Nicholas F. Benson
ABSTRACT Charles Spearman and L. L. Thurstone were pioneers in the field of intelligence. They not only developed methods to assess and understand intelligence, but also developed theories about its structure and function. Methodologically, their approaches were not that distinct, but their theories of intelligence were philosophically very different – and this difference is still seen in modern approaches
-
Challenges to the Cattell-Horn-Carroll Theory: Empirical, Clinical, and Policy Implications Applied Measurement in Education (IF 1.04) Pub Date : 2019-06-17 Gary L. Canivez, Eric A. Youngstrom
ABSTRACT The Cattell-Horn-Carroll (CHC) taxonomy of cognitive abilities married John Horn and Raymond Cattell’s Extended Gf-Gc theory with John Carroll’s Three-Stratum Theory. While there are some similarities in arrangements or classifications of tasks (observed variables) within similar broad or narrow dimensions, other salient theoretical features and statistical methods used for examining and supporting
-
Empirical Considerations on Intelligence Testing and Models of Intelligence: Updates for Educational Measurement Professionals Applied Measurement in Education (IF 1.04) Pub Date : 2019-06-17 Kurt F. Geisinger
ABSTRACT This brief article introduces the topic of intelligence as highly appropriate for educational measurement professionals. It describes some of the uses of intelligence tests both historically and currently. It argues why knowledge of intelligence theory and intelligence testing is important for educational measurement professionals. The articles that follow in this special issue will provide
-
Improving the Predictive Validity of Reading Comprehension Using Response Times of Correct Item Responses Applied Measurement in Education (IF 1.04) Pub Date : 2019-03-13 Shiyang Su, Mark L. Davison
ABSTRACT Response times have often been used as ancillary information to improve parameter estimation. Under the dual processing theory, assuming reading comprehension requires an automatic process, a fast, correct response is an indicator of effective automatic processing. A skilled, automatic comprehender should be high in response accuracy and low in response times. Following this argument, several
-
Partial Credit in Answer-Until-Correct Multiple-Choice Tests Deployed in a Classroom Setting Applied Measurement in Education (IF 1.04) Pub Date : 2019-03-13 Aaron D. Slepkov, Alan T. K. Godfrey
ABSTRACT The answer-until-correct (AUC) method of multiple-choice (MC) testing involves test respondents making selections until the keyed answer is identified. Despite attendant benefits that include improved learning, broad student adoption, and facile administration of partial credit, the use of AUC methods for classroom testing has been extremely limited. This study presents scoring properties
-
The Effects of Effort Monitoring With Proctor Notification on Test-Taking Engagement, Test Performance, and Validity Applied Measurement in Education (IF 1.04) Pub Date : 2019-03-13 Steven L. Wise, Megan R. Kuhfeld, James Soland
ABSTRACT When we administer educational achievement tests, we want to be confident that the resulting scores validly indicate what the test takers know and can do. However, if the test is perceived as low stakes by the test taker, disengaged test taking sometimes occurs, which poses a serious threat to score validity. When computer-based tests are used, disengagement can be detected through occurrences
-
Classification Consistency and Accuracy for Mixed-Format Tests Applied Measurement in Education (IF 1.04) Pub Date : 2019-03-13 Stella Y. Kim, Won-Chan Lee
ABSTRACT This study explores classification consistency and accuracy for mixed-format tests using real and simulated data. In particular, the current study compares six methods of estimating classification consistency and accuracy for seven mixed-format tests. The relative performance of the estimation methods is evaluated using simulated data. Study results from real data analysis showed that the
-
Identifying Disengaged Survey Responses: New Evidence Using Response Time Metadata Applied Measurement in Education (IF 1.04) Pub Date : 2019-03-13 James Soland, Steven L. Wise, Lingyun Gao
ABSTRACT Disengaged responding is a phenomenon that often biases observed scores from achievement tests and surveys in practically and statistically significant ways. This problem has led to the development of methods to detect and correct for disengaged responses on both achievement test and survey scores. One major disadvantage when trying to detect disengaged responses on surveys is that, unlike
Contents have been reproduced by permission of the publishers.