Abstract
In contrast with the assumptions made in standard measurement models used in large-scale assessments, students’ performance may change during the test administration. This change can be modeled as a function of item position in case of a test booklet design with item-order manipulations. The present study used an explanatory item response theory (IRT) framework to analyze item position effects in the 2012 European Survey on Language Competences. Consistent item position effects were found for listening but not for reading. More specifically, for a large subset of items, item difficulty decreased along with item position, which is known as a practice effect. The effect was found across all tested languages, although the effect sizes varied across items, test levels, and countries.
Similar content being viewed by others
References
Albano, A.D., Cai, L., Lease, E.M., McConnel, S.R. (2019). Computerized adaptive testing in early education: exploring the impact of item position effects on ability estimation. Journal of Educational Measurement, 56(2), 437–451.
Bates, D., Mächler, M., Bolker, B., Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.
Bejar, I.I. (1985). Test seededness under number-right scoring: an analysis of the test of English as a foreign language (Report No. ETS-RR-85-11). Princeton: Educational Testing Service.
Bolt, D.M., Cohen, A.S., Wollack, J.A. (2002). Item parameter estimation under conditions of test speededness: application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331–348.
Cosgrove, J., Cartwright, F. (2014). Changes in achievement on PISA: the case of Ireland and implications for international assessment practice. Large-scale Assessments in Education, 2(1), 2.
Davis, J., & Ferdous, A. (2005). Using item difficulty and item position to measure test fatigue. Montreal: Paper presentation at the Annual Meeting of the American Educational Research Association.
Davis, M.H., & Johnsrude, I.S. (2003). Hierarchical processing in spoken language comprehension. The Journal of Neuroscience, 23(8), 3423–3431.
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39(12), 1–28.
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer.
Debeer, D., Buchholz, J., Hartig, J., Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39(6), 502–523.
Debeer, D., & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185.
Erb, J., Henry, M.J., Eisner, F., Obleser, J. (2013). The brain dynamics of rapid perceptual adaptation to adverse listening conditions. The Journal of Neuroscience, 33(26), 10688–10697.
European Commission. (2012). First European Survey on Language Competences: final report. Brussels: Author.
European Commission. (2012). First European Survey on Language Competences: technical report. Brussels: Author.
Goegebeur, Y., De Boeck, P., Wollack, J.A., Cohen, A.S. (2008). A speeded item response model with gradual process change. Psychometrika, 73 (1), 65–87.
Guertin, W.H. (1954). The effect of instructions and item order on the arithmetic subtest of the Wechsler-Bellevue. Journal of Genetic Psychology, 85(1), 79–83.
Hambleton, R.K., & Traub, R.E. (1974). The effects of item order in test performance and stress. Journal of Experimental Education, 43(1), 40–46.
Hamilton, J.C., & Shuminsky, T.R. (1990). Self-awareness mediates the relationship between serial position and item reliability. Journal of Personality and Social Psychology, 59(6), 1301–1037.
Hartig, J., & Buchholz, J. (2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modeling, 54(4), 418–431.
Hohensinn, C., Kubinger, K.D., Reif, M., Holocher-Ertl, S., Khorramdel, L., Frebort, M. (2008). Examining item-position effects in large-scale assessment using the linear logistic test model. Psychology Science Quarterly, 50 (3), 391–402.
Hohensinn, C., Kubinger, K.D., Reif, M., Schleicher, E., Khorramdel, L. (2011). Analysing item position effects due to test booklet design within large-scale assessment. Educational Research and Evaluation, 17(6), 497–509.
Jagodzinski, W., Kühnel, S.M., Schmidt, P. (1987). Is there a ‘Socratic effect’ in nonexperimental panel studies. Sociological Methods and Research, 15(3), 259–302.
Kingston, N.M., & Dorans, N.J. (1982). The effect of the position of an item within a test on item response behaviour: an analysis based on item response theory. GRE Board Professional Report GREB No. 79-12 bp ETS Research Report.
Kingston, N.M., & Dorans, N.J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8(2), 147–154.
Knowles, E.S. (1988). Item context effects on personality scales: measuring changes the measure. Journal of Personality and Social Psychology, 55(2), 312–320.
Knowles, E.S., & Byers, B. (1996). Reliability shifts in measurement reactivity: driven by content engagement or self-engagement?. Journal of Personality and Social Psychology, 70(5), 1080–1090.
McGuire, W.J. (1960). Context and serial-order effects in personality measurement: limits on the generality of measuring changes the measure. The Journal of Abnormal and Social Psychology, 60(3), 345–353.
Meyers, J.L., Miller, G.E., Way, W.D. (2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education, 22(1), 38–60.
Nagy, G., Nagengast, B., Becker, M., Rose, N., Frey, A. (2018). Item position effects in a reading comprehension test: an IRT study of individual differences and individual correlates. Psychological Test and Assessment Modeling, 60(2), 165–187.
Nagy, G., Nagengast, B., Frey, A., Becker, M., Rose, N. (2018). A multilevel study of position effects in PISA achievement tests: student- and school-level predictors in the German tracked school system. Assessment in Education: Principles, Policy & Practice, 26(4), 422–443.
Oshima, T.C. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31(3), 200–219.
R Core Team. (2019). R: a language and environment for statistical computing, R Foundation for Statistical Computing. Vienna.
Ren, X., Wang, T., Altmeyer, M., Schweizer, K. (2014). A learning-based account of fluid intelligence from the perspective of the position effect. Learning and Individual Differences, 31, 30–35.
Schweizer, K., Schreiner, M., Gold, A. (2009). The confirmatory investigation of APM items with loadings as a function of the position and easiness of items: a two-dimensional model of APM. Psychology Science, 51(1), 47–64.
Schweizer, K., Troche, S.J., Rammsayer, T.H. (2011). On the special relationship between fluid and general intelligence: new evidence obtained by considering the position effect. Personality and Individual Differences, 50, 1249–1254.
Steinberg, L. (1994). Context and serial-order effects in personality measurement: limits on the generality of measuring changes the measure. Journal of Personality and Social Psychology, 66(2), 341–349.
Swinton, S.S., Wild, C.L., Wallmark, M.M. (1983). Investigation of practice effects on item types in the graduate record examinations aptitude test. Educational Testing Service, Princeton.
Verguts, T., & De Boeck, P. (2000). A Rasch model for detecting learning while solving an intelligence test. Applied Psychological Measurement, 24(2), 151–162.
Whitely, S.E., & Dawis, R.V. (1976). The influence of test context on item difficulty. Educational and Psychological Measurement, 36, 329–337.
Wild, C.J., Yusuf, A., Wilson, D.E., Peelle, J.E., Davis, M.H., Johnsrude, I.S. (2012). Effortful listening: the processing of degraded speech depends critically on attention. The Journal of Neuroscience, 32(40), 14010–14021.
Wise, L.L. (1986). Latent trait models for partially speeded tests. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco.
Wu, Q., Debeer, D., Buchholz, J., Hartig, J., Janssen, R. (2019). Predictors of individual performance changes related to item positions in PISA assessments. Large-scale Assessments in Education, 7(5), 1–21.
Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the hybrid model. In Rost, J., & Langeheine, R. (Eds.) Applications of latent trait and latent class models in the social sciences. New York: Waxmann.
Yen, W.M. (1980). The extent, causes and importance of context effects on item parameters for two latent trait models. Journal of Educational Measurement, 17(4), 297–311.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Rights and permissions
About this article
Cite this article
Christiansen, A., Janssen, R. Item position effects in listening but not in reading in the European Survey of Language Competences. Educ Asse Eval Acc 33, 49–69 (2021). https://doi.org/10.1007/s11092-020-09335-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11092-020-09335-7