-
Anchor Point Selection: Scale Alignment Based on an Inequality Criterion Applied Psychological Measurement (IF 1.326) Pub Date : 2021-02-25 Carolin Strobl, Julia Kopf, Lucas Kohler, Timo von Oertzen, Achim Zeileis
For detecting differential item functioning (DIF) between two or more groups of test takers in the Rasch model, their item parameters need to be placed on the same scale. Typically this is done by means of choosing a set of so-called anchor items based on statistical tests or heuristics. Here the authors suggest an alternative strategy: By means of an inequality criterion from economics, the Gini Index
-
On Interim Cognitive Diagnostic Computerized Adaptive Testing in Learning Context Applied Psychological Measurement (IF 1.326) Pub Date : 2021-02-23 Chun Wang
Interim assessment occurs throughout instruction to provide feedback about what students know and have achieved. Different from the current available cognitive diagnostic computerized adaptive testing (CD-CAT) design that focuses on assessment at a single time point, the authors discuss several designs of interim CD-CAT that are suitable in the learning context. The interim CD-CAT differs from the
-
First-Order Learning Models With the GDINA: Estimation With the EM Algorithm and Applications Applied Psychological Measurement (IF 1.326) Pub Date : 2021-02-15 Hulya D. Yigit, Jeffrey A. Douglas
In learning environments, understanding the longitudinal path of learning is one of the main goals. Cognitive diagnostic models (CDMs) for measurement combined with a transition model for mastery may be beneficial for providing fine-grained information about students’ knowledge profiles over time. An efficient algorithm to estimate model parameters would augment the practicality of this combination
-
Bayesian Modal Estimation for the One-Parameter Logistic Ability-Based Guessing (1PL-AG) Model Applied Psychological Measurement (IF 1.326) Pub Date : 2021-02-08 Shaoyang Guo, Tong Wu, Chanjin Zheng, Yanlei Chen
The calibration of the one-parameter logistic ability-based guessing (1PL-AG) model in item response theory (IRT) with a modest sample size remains a challenge for its implausible estimates and difficulty in obtaining standard errors of estimates. This article proposes an alternative Bayesian modal estimation (BME) method, the Bayesian Expectation-Maximization-Maximization (BEMM) method, which is developed
-
Modeling Asymmetry in the Time–Distance Relation of Ordinal Personality Items Applied Psychological Measurement (IF 1.326) Pub Date : 2021-02-05 Dylan Molenaar, Sandor Rózsa, Natasa Kõ
In analyzing responses and response times to personality questionnaire items, models have been proposed which include the so-called “inverted-U effect.” These models predict that response times to personality test items decrease as the latent trait value of a given person gets closer to the attractiveness of an item. Initial studies into these models have focused on dichotomous personality items, and
-
Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data Applied Psychological Measurement (IF 1.326) Pub Date : 2021-02-04 Chen-Wei Liu
Missing not at random (MNAR) modeling for non-ignorable missing responses usually assumes that the latent variable distribution is a bivariate normal distribution. Such an assumption is rarely verified and often employed as a standard in practice. Recent studies for “complete” item responses (i.e., no missing data) have shown that ignoring the nonnormal distribution of a unidimensional latent variable
-
SPSS Syntax for Combining Results of Principal Component Analysis of Multiply Imputed Data Sets using Generalized Procrustes Analysis Applied Psychological Measurement (IF 1.326) Pub Date : 2021-02-04 Bart van Wingerde, Joost van Ginkel
Multiple imputation (Rubin, 1987) is a well-known method for handling missing data. Applying the procedure to an incomplete data set results in several plausible complete versions of the incomplete data set which are then all analyzed with the same statistical analysis. In order to obtain one overall analysis that is used for interpretation, the analysis results of these several completed data sets
-
Estimating Cognitive Diagnosis Models in Small Samples: Bayes Modal Estimation and Monotonic Constraints Applied Psychological Measurement (IF 1.326) Pub Date : 2020-12-24 Wenchao Ma, Zhehan Jiang
Despite the increasing popularity, cognitive diagnosis models have been criticized for limited utility for small samples. In this study, the authors proposed to use Bayes modal (BM) estimation and monotonic constraints to stabilize item parameter estimation and facilitate person classification in small samples based on the generalized deterministic input noisy “and” gate (G-DINA) model. Both simulation
-
Improving Accuracy and Usage by Correctly Selecting: The Effects of Model Selection in Cognitive Diagnosis Computerized Adaptive Testing Applied Psychological Measurement (IF 1.326) Pub Date : 2020-12-14 Miguel A. Sorrel, Francisco José Abad, Pablo Nájera
Decisions on how to calibrate an item bank might have major implications in the subsequent performance of the adaptive algorithms. One of these decisions is model selection, which can become problematic in the context of cognitive diagnosis computerized adaptive testing, given the wide range of models available. This article aims to determine whether model selection indices can be used to improve the
-
Flexible Computerized Adaptive Tests to Detect Misconceptions and Estimate Ability Simultaneously Applied Psychological Measurement (IF 1.326) Pub Date : 2020-11-06 Yu Bao, Yawei Shen, Shiyu Wang, Laine Bradshaw
The Scaling Individuals and Classifying Misconceptions (SICM) model is an advanced psychometric model that can provide feedback to examinees’ misconceptions and a general ability simultaneously. These two types of feedback are represented by a discrete and a continuous latent variable, respectively, in the SICM model. The complex structure of the SICM model brings difficulties in estimating both misconception
-
Assessment of Differential Statement Functioning in Ipsative Tests With Multidimensional Forced-Choice Items Applied Psychological Measurement (IF 1.326) Pub Date : 2020-10-21 Xue-Lan Qiu, Wen-Chung Wang
Ipsative tests with multidimensional forced-choice (MFC) items have been widely used to assess career interest, values, and personality to prevent response biases. Recently, there has been a surge of interest in developing item response theory models for MFC items. In reality, a statement in an MFC item may have different utilities for different groups, which is referred to as differential statement
-
Asymptotic Standard Errors of Parameter Scale Transformation Coefficients in Test Equating Under the Nominal Response Model Applied Psychological Measurement (IF 1.326) Pub Date : 2020-10-21 Zhonghua Zhang
Researchers have developed a characteristic curve procedure to estimate the parameter scale transformation coefficients in test equating under the nominal response model. In the study, the delta method was applied to derive the standard error expressions for computing the standard errors for the estimates of the parameter scale transformation coefficients. This brief report presents the results of
-
Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models Applied Psychological Measurement (IF 1.326) Pub Date : 2020-10-21 Wenchao Ma, Ragip Terzi, Jimmy de la Torre
This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test
-
Optimal Hierarchical Learning Path Design With Reinforcement Learning Applied Psychological Measurement (IF 1.326) Pub Date : 2020-08-22 Xiao Li, Hanchen Xu, Jinming Zhang, Hua-hua Chang
E-learning systems are capable of providing more adaptive and efficient learning experiences for learners than traditional classroom settings. A key component of such systems is the learning policy. The learning policy is an algorithm that designs the learning paths or rather it selects learning materials for learners based on information such as the learners’ current progresses and skills, learning
-
The Block Item Pocket Method for Reviewable Multidimensional Computerized Adaptive Testing Applied Psychological Measurement (IF 1.326) Pub Date : 2020-08-14 Zhe Lin, Ping Chen, Tao Xin
Most computerized adaptive testing (CAT) programs do not allow item review due to a decrease in estimation precision and aberrant manipulation strategies. In this article, a block item pocket (BIP) method that combines the item pocket method with the successive block method to realize reviewable CAT was proposed. A worst-case but still reasonable answering strategy and the Wainer-like manipulation
-
Making Fixed-Precision Between-Item Multidimensional Computerized Adaptive Tests Even Shorter by Reducing the Asymmetry Between Selection and Stopping Rules Applied Psychological Measurement (IF 1.326) Pub Date : 2020-07-03 Johan Braeken, Muirne C. S. Paap
Fixed-precision between-item multidimensional computerized adaptive tests (MCATs) are becoming increasingly popular. The current generation of item-selection rules used in these types of MCATs typically optimize a single-valued objective criterion for multivariate precision (e.g., Fisher information volume). In contrast, when all dimensions are of interest, the stopping rule is typically defined in
-
IRTBEMM: An R Package for Estimating IRT Models With Guessing or Slipping Parameters Applied Psychological Measurement (IF 1.326) Pub Date : 2020-06-26 Shaoyang Guo, Chanjin Zheng, Justin L. Kern
A recently released R package IRTBEMM is presented in this article. This package puts together several new estimation algorithms (Bayesian EMM, Bayesian E3M, and their maximum likelihood versions) for the Item Response Theory (IRT) models with guessing and slipping parameters (e.g., 3PL, 4PL, 1PL-G, and 1PL-AG models). IRTBEMM should be of interest to the researchers in IRT estimation and applying
-
An Exploratory Strategy to Identify and Define Sources of Differential Item Functioning Applied Psychological Measurement (IF 1.326) Pub Date : 2020-06-24 Chung-Ping Cheng, Chi-Chen Chen, Ching-Lin Shih
The sources of differential item functioning (DIF) items are usually identified through a qualitative content review by a panel of experts. However, the differential functioning for some DIF items might have been caused by reasons outside of the experts’ experiences, leading to the sources for these DIF items possibly being misidentified. Quantitative methods can help to provide useful information
-
Using Machine Learning Methods to Develop a Short Tree-Based Adaptive Classification Test: Case Study With a High-Dimensional Item Pool and Imbalanced Data Applied Psychological Measurement (IF 1.326) Pub Date : 2020-06-18 Yi Zheng, Hyunjung Cheon, Charles M. Katz
This study explores advanced techniques in machine learning to develop a short tree-based adaptive classification test based on an existing lengthy instrument. A case study was carried out for an assessment of risk for juvenile delinquency. Two unique facts of this case are (a) the items in the original instrument measure a large number of distinctive constructs; (b) the target outcomes are of low
-
OpenMx: A Modular Research Environment for Item Response Theory Method Development Applied Psychological Measurement (IF 1.326) Pub Date : 2020-06-13 Joshua N. Pritikin, Carl F. Falk
There are many item response theory software packages designed for users. Here, the authors introduce an environment tailored to method development and simulation. Implementations of a selection of classic algorithms are available as well as some recently developed methods. Source code is developed in public repositories on GitHub; your collaboration is welcome.
-
A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention Applied Psychological Measurement (IF 1.326) Pub Date : 2020-06-06 Yinghan Chen, Steven Andrew Culpepper
Advances in educational technology provide teachers and schools with a wealth of information about student performance. A critical direction for educational research is to harvest the available longitudinal data to provide teachers with real-time diagnoses about students’ skill mastery. Cognitive diagnosis models (CDMs) offer educational researchers, policy makers, and practitioners a psychometric
-
On the Practical Consequences of Misfit in Mokken Scaling. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-06-01 Daniela Ramona Crişan,Jorge N Tendeiro,Rob R Meijer
Mokken scale analysis is a popular method to evaluate the psychometric quality of clinical and personality questionnaires and their individual items. Although many empirical papers report on the extent to which sets of items form Mokken scales, there is less attention for the effect of violations of commonly used rules of thumb. In this study, the authors investigated the practical consequences of
-
Uncertainty in Latent Trait Models. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-05-26 Gerhard Tutz,Gunther Schauberger
A model that extends the Rasch model and the Partial Credit Model to account for subject-specific uncertainty when responding to items is proposed. It is demonstrated that ignoring the subject-specific uncertainty may yield biased estimates of model parameters. In the extended version of the model, uncertainty and the underlying trait are linked to explanatory variables. The parameterization allows
-
irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT Applied Psychological Measurement (IF 1.326) Pub Date : 2020-05-21 Hwanggyu Lim, Craig S. Wells
The R package irtplay provides practical tools for unidimensional item response theory (IRT) models that conveniently enable users to conduct many analyses related to IRT. For example, the irtplay includes functions for calibrating online items, scoring test-takers’ proficiencies, evaluating IRT model-data fit, and importing item and/or proficiency parameter estimates from the output of popular IRT
-
A Seed Usage Issue on Using catR for Simulation and the Solution. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-05-08 Zhongmin Cui
The R package catR is a useful developmental and testing platform for adaptive tests. Developers and researchers have been using catR for their assessments or research projects. However, there is a flaw in catR that can potentially cause misleading results. This article shows the flaw and provides a solution. Suggestions in using seeds for random numbers were also provided.
-
Partially and Fully Noncompensatory Response Models for Dichotomous and Polytomous Items. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-05-08 R Philip Chalmers
This article extends Sympson’s partially noncompensatory dichtomous response model to ordered response data, and introduces a set of fully noncompensatory models for dichotomous and polytomous response data. The theoretical properties of the partially and fully noncompensatory response models are contrasted, and a small set of Monte Carlo simulations are presented to evaluate their parameter recovery
-
The Monotonic Polynomial Graded Response Model: Implementation and a Comparative Study. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-04-15 Carl F Falk
We present a monotonic polynomial graded response (GRMP) model that subsumes the unidimensional graded response model for ordered categorical responses and results in flexible category response functions. We suggest improvements in the parameterization of the polynomial underlying similar models, expand upon an underlying response variable derivation of the model, and in lieu of an overall discrimination
-
Detection of Item Preknowledge Using Response Times. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-04-13 Sandip Sinharay
Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. This article suggests a new statistic that can be used for detecting the examinees who may have benefited from item preknowledge using their response times. The statistic quantifies the difference in speed between the compromised items and the non-compromised items of the examinees. The distribution
-
Improving Robustness in Q-Matrix Validation Using an Iterative and Dynamic Procedure. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-03-19 Pablo Nájera,Miguel A Sorrel,Jimmy de la Torre,Francisco José Abad
In the context of cognitive diagnosis models (CDMs), a Q-matrix reflects the correspondence between attributes and items. The Q-matrix construction process is typically subjective in nature, which may lead to misspecifications. All this can negatively affect the attribute classification accuracy. In response, several methods of empirical Q-matrix validation have been developed. The general discrimination
-
The Power of Crossing SIBTEST. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-03-14 Zhushan Li
Crossing SIBTEST or CSIB is designed to detect crossing differential item functioning (DIF) as well as unidirectional DIF. A theoretical formula for the power of CSIB is derived based on the asymptotic distribution of the test statistic under the null and alternative hypotheses. The derived power formula provides insights on the factors that influence the CSIB power, including DIF effect size, standard
-
InDisc: An R Package for Assessing Person and Item Discrimination in Typical-Response Measures. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-03-14 Pere J Ferrando,David Navarro-González
InDisc is an R Package that implements procedures for estimating and fitting unidimensional Item Response Theory (IRT) Dual Models (DMs). DMs are intended for personality and attitude measures and are, essentially, extended standard IRT models with an extra person parameter that models the discriminating power of the individual. The package consists of a main function, which calls subfunctions for
-
Robustness of Projective IRT to Misspecification of the Underlying Multidimensional Model. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-03-10 Tyler Strachan,Edward Ip,Yanyan Fu,Terry Ackerman,Shyh-Huei Chen,John Willse
As a method to derive a “purified” measure along a dimension of interest from response data that are potentially multidimensional in nature, the projective item response theory (PIRT) approach requires first fitting a multidimensional item response theory (MIRT) model to the data before projecting onto a dimension of interest. This study aims to explore how accurate the PIRT results are when the estimated
-
Using Bayesian Nonparametric Item Response Function Estimation to Check Parametric Model Fit. Applied Psychological Measurement (IF 1.326) Pub Date : 2020-03-10 Wenhao Wang,Neal Kingston
Previous studies indicated that the assumption of logistic form of parametric item response functions (IRFs) is violated often enough to be worth checking. Using nonparametric item response theory (IRT) estimation methods with the posterior predictive model checking method can obtain significance probabilities of fit statistics in a Bayesian framework by accounting for the uncertainty of the parameter
-
Fit Indices for Measurement Invariance Tests in the Thurstonian IRT Model. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-12-26 HyeSun Lee,Weldon Z Smith
This study examined whether cutoffs in fit indices suggested for traditional formats with maximum likelihood estimators can be utilized to assess model fit and to test measurement invariance when a multiple group confirmatory factor analysis was employed for the Thurstonian item response theory (IRT) model. Regarding the performance of the evaluation criteria, detection of measurement non-invariance
-
Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-12-21 Jing Yang,Hua-Hua Chang,Jian Tao,Ningzhong Shi
Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item
-
An Optimized Bayesian Hierarchical Two-Parameter Logistic Model for Small-Sample Item Calibration. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-12-21 Christoph König,Christian Spoden,Andreas Frey
Accurate item calibration in models of item response theory (IRT) requires rather large samples. For instance, N > 500 respondents are typically recommended for the two-parameter logistic (2PL) model. Hence, this model is considered a large-scale application, and its use in small-sample contexts is limited. Hierarchical Bayesian approaches are frequently proposed to reduce the sample size requirements
-
Evaluating Robust Scale Transformation Methods With Multiple Outlying Common Items Under IRT True Score Equating. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-11-15 Yong He,Zhongmin Cui
Item parameter estimates of a common item on a new test form may change abnormally due to reasons such as item overexposure or change of curriculum. A common item, whose change does not fit the pattern implied by the normally behaved common items, is defined as an outlier. Although improving equating accuracy, detecting and eliminating of outliers may cause a content imbalance among common items. Robust
-
Approximating Bifactor IRT True-Score Equating With a Projective Item Response Model. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-11-13 Kyung Yong Kim,Uk Hyun Cho
Item response theory (IRT) true-score equating for the bifactor model is often conducted by first numerically integrating out specific factors from the item response function and then applying the unidimensional IRT true-score equating method to the marginalized bifactor model. However, an alternative procedure for obtaining the marginalized bifactor model is through projecting the nuisance dimensions
-
equate: Observed-Score Linking and Equating in R. Applied Psychological Measurement (IF 1.326) Pub Date : 2016-07-01 Anthony D Albano
-
lzstarMix: Assessment of Person Fit for Mixed-Format Tests. Applied Psychological Measurement (IF 1.326) Pub Date : 2016-01-01 Sandip Sinharay
-
Adaptive Learning Recommendation Strategy Based on Deep Q-learning. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-07-25 Chunxi Tan,Ruijian Han,Rougang Ye,Kani Chen
Personalized recommendation system has been widely adopted in E-learning field that is adaptive to each learner’s own learning pace. With full utilization of learning behavior data, psychometric assessment models keep track of the learner’s proficiency on knowledge points, and then, the well-designed recommendation strategy selects a sequence of actions to meet the objective of maximizing learner’s
-
Examining the Impact of Differential Item Functioning on Classification Accuracy in Cognitive Diagnostic Models. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-07-04 Justin Paulsen,Dubravka Svetina,Yanan Feng,Montserrat Valdivia
Cognitive diagnostic models (CDMs) are of growing interest in educational research because of the models’ ability to provide diagnostic information regarding examinees’ strengths and weaknesses suited to a variety of content areas. An important step to ensure appropriate uses and interpretations from CDMs is to understand the impact of differential item functioning (DIF). While methods of detecting
-
Bias of Two-Level Scalability Coefficients and Their Standard Errors. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-05-14 Letty Koopman,Bonne J H Zijlstra,Mark de Rooij,L Andries van der Ark
Two-level Mokken scale analysis is a generalization of Mokken scale analysis for multi-rater data. The bias of estimated scalability coefficients for two-level Mokken scale analysis, the bias of their estimated standard errors, and the coverage of the confidence intervals has been investigated, under various testing conditions. It was found that the estimated scalability coefficients were unbiased
-
Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-05-14 Wenchao Ma
Limited-information fit measures appear to be promising in assessing the goodness-of-fit of dichotomous response cognitive diagnosis models (CDMs), but their performance has not been examined for polytomous response CDMs. This study investigates the performance of the Mord statistic and standardized root mean square residual (SRMSR) for an ordinal response CDM—the sequential generalized deterministic
-
Automated Test Assembly Using SAS Operations Research Software in a Medical Licensing Examination Applied Psychological Measurement (IF 1.326) Pub Date : 2019-05-07 Can Shao, Silu Liu, Hongwei Yang, Tsung-Hsun Tsai
Mathematical programming has been widely used by professionals in testing agencies as a tool to automatically construct equivalent test forms. This study introduces the linear programming capabilities (modeling language plus solvers) of SAS Operations Research as a platform to rigorously engineer tests on specifications in an automated manner. To that end, real items from a medical licensing test are
-
A Comparison of Software Packages Available for DINA Model Estimation Applied Psychological Measurement (IF 1.326) Pub Date : 2019-04-23 Sedat Sen, Ragip Terzi
This article provides a review of software packages for fitting maximum likelihood estimation of the deterministic input, noisy “and” gate (DINA) model. Six software packages—flexMIRT, Latent GOLD, mdltm, Mplus, OxEdit, and R—are considered. Each package is reviewed regarding data manipulation, statistical capabilities, output, and documentation. The results of these packages are compared using a sample
-
An Item Response Model for True-False Exams Based on Signal Detection Theory. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-04-23 Lawrence T DeCarlo
A true–false exam can be viewed as being a signal detection task—the task is to detect whether or not an item is true (signal) or false (noise). In terms of signal detection theory (SDT), examinees can be viewed as performing the task by comparing the perceived plausibility of an item (a perceptual component) to a threshold that delineates true from false (a decision component). The resulting model
-
A Dynamic Stratification Method for Improving Trait Estimation in Computerized Adaptive Testing Under Item Exposure Control. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-04-23 Jyun-Hong Chen,Hsiu-Yi Chao,Shu-Ying Chen
When computerized adaptive testing (CAT) is under stringent item exposure control, the precision of trait estimation will substantially decrease. A new item selection method, the dynamic Stratification method based on Dominance Curves (SDC), which is aimed at improving trait estimation, is proposed to mitigate this problem. The objective function of the SDC in item selection is to maximize the sum
-
MIMIC Models for Uniform and Nonuniform DIF as Moderated Mediation Models. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-04-12 Amanda K Montoya,Minjeong Jeon
In this article, the authors describe how multiple indicators multiple cause (MIMIC) models for studying uniform and nonuniform differential item functioning (DIF) can be conceptualized as mediation and moderated mediation models. Conceptualizing DIF within the context of a moderated mediation model helps to understand DIF as the effect of some variable on measurements that is not accounted for by
-
Testing the Local Independence Assumption of the Rasch Model With Q 3-Based Nonparametric Model Tests. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-03-31 Rudolf Debelak,Ingrid Koller
Local independence is a central assumption of commonly used item response theory models. Violations of this assumption are usually tested using test statistics based on item pairs. This study presents two quasi-exact tests based on the Q 3 statistic for testing the hypothesis of local independence in the Rasch model. The proposed tests do not require the estimation of item parameters and can also be
-
Reliability for Tests With Items Having Different Numbers of Ordered Categories. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-03-20 Seohyun Kim,Zhenqiu Lu,Allan S Cohen
This study describes a structural equation modeling (SEM) approach to reliability for tests with items having different numbers of ordered categories. A simulation study is provided to compare the performance of this reliability coefficient, coefficient alpha and population reliability for tests having items with different numbers of ordered categories, a one-factor and a bifactor structures, and different
-
Framework for Developing Multistage Testing With Intersectional Routing for Short-Length Tests. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-03-20 Kyung Chris T Han
Multistage testing (MST) has many practical advantages over typical item-level computerized adaptive testing (CAT), but there is a substantial tradeoff when using MST because of its reduced level of adaptability. In typical MST, the first stage almost always performs as a routing stage in which all test takers see a linear test form. If multiple test sections measure different but moderately or highly
-
A Psychometric Model for Discrete-Option Multiple-Choice Items Applied Psychological Measurement (IF 1.326) Pub Date : 2019-03-19 Daniel M. Bolt, Nana Kim, James Wollack, Yiqin Pan, Carol Eckerly, John Sowles
Discrete-option multiple-choice (DOMC) items differ from traditional multiple-choice (MC) items in the sequential administration of response options (up to display of the correct option). DOMC can be appealing in computer-based test administrations due to its protection of item security and its potential to reduce testwiseness effects. A psychometric model for DOMC items that attends to the random
-
A Blocked-CAT Procedure for CD-CAT Applied Psychological Measurement (IF 1.326) Pub Date : 2019-03-19 Mehmet Kaplan, Jimmy de la Torre
This article introduces a blocked-design procedure for cognitive diagnosis computerized adaptive testing (CD-CAT), which allows examinees to review items and change their answers during test administration. Four blocking versions of the new procedure were proposed. In addition, the impact of several factors, namely, item quality, generating model, block size, and test length, on the classification
-
A Sequential Higher Order Latent Structural Model for Hierarchical Attributes in Cognitive Diagnostic Assessments Applied Psychological Measurement (IF 1.326) Pub Date : 2019-03-04 Peida Zhan, Wenchao Ma, Hong Jiao, Shuliang Ding
The higher-order structure and attribute hierarchical structure are two popular approaches to defining the latent attribute space in cognitive diagnosis models. However, to our knowledge, it is still impossible to integrate them to accommodate the higher-order latent trait and hierarchical attributes simultaneously. To address this issue, this article proposed a sequential higher-order latent structural
-
Multidimensional Test Assembly Using Mixed-Integer Linear Programming: An Application of Kullback–Leibler Information Applied Psychological Measurement (IF 1.326) Pub Date : 2019-02-25 Dries Debeer, Peter W. van Rijn, Usama S. Ali
Many educational testing programs require different test forms with minimal or no item overlap. At the same time, the test forms should be parallel in terms of their statistical and content-related properties. A well-established method to assemble parallel test forms is to apply combinatorial optimization using mixed-integer linear programming (MILP). Using this approach, in the unidimensional case
-
Joint Modeling of Compensatory Multidimensional Item Responses and Response Times. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-02-22 Kaiwen Man,Jeffrey R Harring,Hong Jiao,Peida Zhan
Computer-based testing (CBT) is becoming increasingly popular in assessing test-takers’ latent abilities and making inferences regarding their cognitive processes. In addition to collecting item responses, an important benefit of using CBT is that response times (RTs) can also be recorded and used in subsequent analyses. To better understand the structural relations between multidimensional cognitive
-
Linking With External Covariates: Examining Accuracy by Anchor Type, Test Length, Ability Difference, and Sample Size. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-02-14 Anthony D Albano,Marie Wiberg
Research has recently demonstrated the use of multiple anchor tests and external covariates to supplement or substitute for common anchor items when linking and equating with nonequivalent groups. This study examines the conditions under which external covariates improve linking and equating accuracy, with internal and external anchor tests of varying lengths and groups of differing abilities. Pseudo
-
New Efficient and Practicable Adaptive Designs for Calibrating Items Online Applied Psychological Measurement (IF 1.326) Pub Date : 2019-01-30 Yinhong He, Ping Chen, Yong Li
When calibrating new items online, it is practicable to first compare all new items according to some criterion and then assign the most suitable one to the current examinee who reaches a seeding location. The modified D-optimal design proposed by van der Linden and Ren (denoted as D-VR design) works within this practicable framework with the aim of directly optimizing the estimation of item parameters
-
An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model. Applied Psychological Measurement (IF 1.326) Pub Date : 2019-01-23 Audrey J Leroux,J Kay Waid-Ebbs,Pey-Shan Wen,Drew A Helmer,David P Graham,Maureen K O'Connor,Kathleen Ray
The purpose of this simulation study was to investigate the effect of several different item exposure control procedures in computerized adaptive testing (CAT) with variable-length stopping rules using the partial credit model. Previous simulation studies on CAT exposure control methods with polytomous items rarely considered variable-length tests. The four exposure control techniques examined were
Contents have been reproduced by permission of the publishers.