当前位置: X-MOL 学术European Journal of Psychological Assessment › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ice Is Hot and Water Is Dry
European Journal of Psychological Assessment ( IF 2.892 ) Pub Date : 2021-12-22 , DOI: 10.1027/1015-5759/a000691
Natalie Förster 1 , Jörg-Tobias Kuhn 2
Affiliation  

Abstract. To monitor students’ progress and adapt instruction to students’ needs, teachers increasingly use repeated assessments of equivalent tests. The present study investigates whether equivalent reading tests can be successfully developed via rule-based item design. Based on theoretical considerations, we identified 3-item features for reading comprehension at the word, sentence, and text levels, respectively, which should influence the difficulty and time intensity of reading processes. Using optimal design algorithms, a design matrix was calculated, and four equivalent test forms of the German reading test series for second graders (quop-L2) were developed. A total of N = 7,751 students completed the tests. We estimated item difficulty and time intensity parameters as well as person ability and speed parameters using bivariate item response theory (IRT) models, and we investigated the influence of item features on item parameters. Results indicate that all item properties significantly affected either item difficulty or response time. Moreover, as indicated by the IRT-based test information functions and analyses of variance, the four different test forms showed similar levels of difficulty and time-intensity at the word, sentence, and text levels (all η2 < .002). Results were successfully cross-validated using a sample of N = 5,654 students.

中文翻译:

冰是热的,水是干的

摘要。为了监控学生的进步并根据学生的需求调整教学,教师越来越多地使用对等测试的重复评估。本研究调查是否可以通过基于规则的项目设计成功开发等效阅读测试。基于理论考虑,我们分别在单词、句子和文本级别确定了阅读理解的 3 项特征,这些特征应该会影响阅读过程的难度和时间强度。使用优化设计算法计算设计矩阵,并开发了德国二年级阅读测试系列(quop-L2)的四种等效测试形式。共有 N = 7,751 名学生完成了测试。我们使用双变量项目响应理论 (IRT) 模型估计项目难度和时间强度参数以及个人能力和速度参数,并研究项目特征对项目参数的影响。结果表明,所有项目属性都显着影响项目难度或响应时间。此外,如基于 IRT 的测试信息函数和方差分析所示,四种不同的测试形式在单词、句子和文本级别显示出相似的难度和时间强度(所有 η2 < .002)。使用 N = 5,654 名学生的样本成功地交叉验证了结果。正如基于 IRT 的测试信息函数和方差分析所示,四种不同的测试形式在单词、句子和文本级别显示出相似的难度和时间强度(所有 η2 < .002)。使用 N = 5,654 名学生的样本成功地交叉验证了结果。正如基于 IRT 的测试信息函数和方差分析所示,四种不同的测试形式在单词、句子和文本级别显示出相似的难度和时间强度(所有 η2 < .002)。使用 N = 5,654 名学生的样本成功地交叉验证了结果。
更新日期:2021-12-22
down
wechat
bug