当前位置: X-MOL 学术Journal of Educational Measurement › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Studying Score Stability with a Harmonic Regression Family: A Comparison of Three Approaches to Adjustment of Examinee‐Specific Demographic Data
Journal of Educational Measurement ( IF 1.188 ) Pub Date : 2020-02-18 , DOI: 10.1111/jedm.12266
Yi‐Hsuan Lee 1 , Shelby J. Haberman 2
Affiliation  

For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be unforeseen. The situation is more challenging for assessments that assemble many different forms and deliver frequent administrations per year. Harmonic regression, a seasonal‐adjustment method, has been found useful in achieving the goal of differentiating between possible known sources of variability and unknown sources so as to study score stability for such assessments. As an extension, this paper presents a family of three approaches that incorporate examinees' demographic data into harmonic regression in different ways. A generic evaluation method based on jackknifing is developed to compare the approaches within the family. The three approaches are compared using real data from an international language assessment. Results suggest that all approaches perform similarly and are effective in meeting the goal. The paper also discusses the properties and limitations of the three approaches, along with inferences about score (in)stability based on the harmonic regression results.

中文翻译:

用调和回归族研究得分稳定性:三种调整被考试者特定人口数据的方法的比较

对于在不同主管部门使用不同形式的评估,采用等同方法以确保分数随时间的可比性。理想情况下,在测试程序的整个生命周期中都可以很好地保持分数等级。实际上,分数量表的不稳定可能是由多种原因引起的,其中某些是可以预料的,而另一些则是不可预见的。对于以许多不同形式汇总并每年交付频繁行政管理的评估而言,情况更具挑战性。谐波回归是一种季节性调整方法,已被发现有助于实现区分可能的已知变异性来源和未知来源,从而研究此类评估的分数稳定性的目标。作为扩展,本文提出了三种方法的系列,它们结合了考生的 人口统计数据以不同的方式转化为谐波回归。开发了一种基于千斤顶的通用评估方法,以比较该系列中的方法。使用来自国际语言评估的真实数据对这三种方法进行了比较。结果表明,所有方法都具有相似的效果,并且可以有效地实现目标。本文还讨论了这三种方法的性质和局限性,以及基于谐波回归结果的关于分数(不稳定性)的推论。
更新日期:2020-02-18
down
wechat
bug