Skip to main content
Log in

Re-reviewing item parameter equivalence in TIMSS 2015 from a sociocognitive perspective

  • Published:
Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Abstract

From a sociocognitive perspective, item parameters in a test represent regularities in examinees’ item responses. These regularities are originated from shared experiences among individuals in interacting with their environment. Theories explaining the relationship between culture and cognition also acknowledge these shared experiences as the source of human cognition. In this context, this study argues that if human cognition is a cultural phenomenon and not everywhere the same, then item parameters in cross-cultural surveys may inevitably fluctuate across culturally different populations. The investigation of item parameter equivalence in TIMSS 2015 supports this argument. The multidimensional scaling representation of similarity in the item parameters across countries in TIMSS 2015 shows that the item parameters are more similar within Arab, Western, East Asian and post-Soviet country clusters and are remarkably less similar between these clusters. Similar fluctuation structure across countries in discrimination and difficulty parameters in mathematics and science subjects point to the vital role of cultural differences in item parameter nonequivalence in cross-cultural surveys. The study concludes that it is very difficult for cross-cultural surveys to achieve the highest level of measurement invariance that guarantees meaningful scale score comparisons across countries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The data sets analysed in this present study are available in the TIMSS 2015 international database, https://timssandpirls.bc.edu/timss2015/international-database/

References

  • Bandura, A. (1977). Social learning theory. Prentice-Hall.

  • Bleakley, A., Bligh, J., & Browne, J. (2011). Medical education for the future: identity, power and location. Springer.

  • Borg, I., Groenen, P. J. F., & Mair, P. (2018). Applied multidimensional scaling and unfolding. Springer.

  • Byrne, B. M., & van de Vijver, F. J. R. (2010). Testing for measurement and structural equivalence in large-scale cross-cultural studies: addressing the issue of nonequivalence. International Journal of Testing, 10(2), 107–132. https://doi.org/10.1080/15305051003637306.

    Article  Google Scholar 

  • Cantlon, J. F., & Brannon, E. M. (2007). Adding up the effects of cultural experience on the brain. Trends in Cognitive Sciences, 11(1), 1–4. https://doi.org/10.1016/j.tics.2006.10.008.

    Article  Google Scholar 

  • Cole, M. (2005). Putting culture in the middle. In H. Daniels (Ed.), An Introduction to Vygotsky (pp. 195–222). Routledge.

  • Cole, M., John-Steiner, V., Scribner, S., & Souberman, E. (Eds.). (1978). L.S. Vygotsky: mind in society: the development of higher psychological processes. Harvard University Press.

  • Davidov, E., Dülmer, H., Schlüter, E., Schmidt, P., & Meuleman, B. (2012). Using a multilevel structural equation modeling approach to explain cross-cultural measurement noninvariance. Journal of Cross-Cultural Psychology, 43(4), 558–575. https://doi.org/10.1177/0022022112438397.

    Article  Google Scholar 

  • Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross-national research. Annual Review of Sociology, 40(1), 55–75. https://doi.org/10.1146/annurev-soc-071913-043137.

    Article  Google Scholar 

  • de Leeuw, J., & Mair, P. (2009). Multidimensional scaling using majorization: SMACOF in R. Journal of Statistical Software, 31(3), 1–30. https://doi.org/10.18637/jss.v031.i03.

    Article  Google Scholar 

  • de Leeuw, J., & Meulman, J. (1986). A special jackknife for multidimensional scaling. Journal of Classification, 3(1), 97–112. https://doi.org/10.1007/BF01896814.

    Article  Google Scholar 

  • Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3–4), 199–215. https://doi.org/10.1080/15305058.2002.9669493.

    Article  Google Scholar 

  • Fernández, I., Carrera, P., Sánchez, F., Paez, D., & Candia, L. (2000). Differences between cultures in emotional verbal and non-verbal reactions. Psicothema, 12(SUPPL. 1), 83–92.

    Google Scholar 

  • Foy, P., Martin, M. O., Mullis, I. V. S., Yin, L., Centurino, V. A. S., & Reynolds, K. A. (2016). Reviewing the TIMSS 2015 achievement item statistics. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and procedures in TIMSS 2015 IEA.

    Google Scholar 

  • Glöckner-Rist, A., & Hoijtink, H. (2003). The best of both worlds: factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544–565. https://doi.org/10.1207/S15328007SEM1004_4.

    Article  Google Scholar 

  • Graffelman, J. (2013). Calibrate: calibration of scatterplot and biplot axes. R package version, 1(7), 2 http://CRAN.R-project.org/package=calibrate.

    Google Scholar 

  • Greenfield, P. M., Maynard, A. E., & Childs, C. P. (2003). Historical change, cultural learning, and cognitive representation in Zinacantec Maya children. In Cognitive Development, 18, 455–487. https://doi.org/10.1016/j.cogdev.2003.09.004.

    Article  Google Scholar 

  • Grisay, A., & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33(1), 69–86. https://doi.org/10.1016/j.stueduc.2007.01.006.

    Article  Google Scholar 

  • He, J., Barrera-Pedemonte, F., & Buchholz, J. (2018). Cross-cultural comparability of noncognitive constructs in TIMSS and PISA (pp. 1–17). Principles, Policy and Practice: Assessment in Education. https://doi.org/10.1080/0969594X.2018.1469467.

    Book  Google Scholar 

  • Heine, S. J., Lehman, D. R., Ide, E., Leung, C., Kitayama, S., Takata, T., & Matsumoto, H. (2001). Divergent consequences of success and failure in Japan and North America: an investigation of self-improving motivations and malleable selves. Journal of Personality and Social Psychology, 81(4), 599–615. https://doi.org/10.1037/0022-3514.81.4.599.

    Article  Google Scholar 

  • Kreiner, S., & Christensen, K. B. (2013). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210–231. https://doi.org/10.1007/s11336-013-9347-z.

    Article  Google Scholar 

  • Lorenzo-Seva, U., & ten Berge, J. M. F. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology, 2(2), 57–64. https://doi.org/10.1027/1614-2241.2.2.57.

    Article  Google Scholar 

  • Lubienski, S. T. (2000). Problem solving as a means toward mathematics for all: an exploratory look through a class lens. Journal for Research in Mathematics Education, 31(4), 454–482. https://doi.org/10.2307/749653.

    Article  Google Scholar 

  • Lubienski, S. T. (2007). What we can do about achievement disparities. Educational Leadership, 65(3), 54–59.

    Google Scholar 

  • Mair, P., Borg, I., & Rusch, T. (2016). Goodness-of-fit assessment in multidimensional scaling and unfolding. Multivariate Behavioral Research, 51(6), 772–789. https://doi.org/10.1080/00273171.2016.1235966.

    Article  Google Scholar 

  • Mair, P., Groenen, P. J. F., & de Leeuw, J. (2019). More on multidimensional scaling and unfolding in R: Smacof version 2. Retrieved from https://cran.pau.edu.tr/web/packages/smacof/vignettes/smacof.pdf

  • Martin, M., Mullis, I. V. S., & Foy, P. (2013). TIMSS 2015 assessment design. In I. V. S. Mullis & M. O. Martin (Eds.), Timss 2015 assessment frameworks (pp. 85–98). Chestnut Hill: TIMSS & PIRLS International Study Center, Boston College.

    Google Scholar 

  • Martin, M., Mullis, I. V. S., & Hooper, M. (2016). Methods and procedures in Timss 2015. Chestnut Hill: TIMSS & PIRLS International Study Center, Boston College.

    Google Scholar 

  • Mislevy, R. J. (2018). Sociocognitive foundations of educational measurement. New York: Routledge.

    Book  Google Scholar 

  • Mullis, I. V. S., & Martin, M. O. (2013). TIMSS 2015 assessment frameworks. Chestnut Hill: TIMSS & PIRLS International Study Center,Boston College.

    Google Scholar 

  • Nisbett, R. E. (2005). The geography of thought. Nicholas Brealey Publishing.

  • Nisbett, R. E., Choi, I., Peng, K., & Norenzayan, A. (2001). Culture and systems of thought: holistic versus analytic cognition. Psychological Review, 108(2), 291–310. https://doi.org/10.1037/0033-295X.108.2.291.

    Article  Google Scholar 

  • Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333 Retrieved from http://www.psychologie-aktuell.com/fileadmin/download/ptam/3-2011_20110927/04_Oliveri.pdf.

    Google Scholar 

  • Oliveri, M. E., & von Davier, M. (2014). Toward increasing fairness in score scale calibrations employed in international large-scale assessments. International Journal of Testing, 14(1), 1–21. https://doi.org/10.1080/15305058.2013.825265.

    Article  Google Scholar 

  • Punter, R. A. (2018). Improving the modelling of response variation in international large-scale assessments. [unpublished doctoral dissertation]. University of Twente.

  • Rutkowski, L., & Rutkowski, D. (2018). Improving the comparability and local usefulness of international assessments: a look back and a way forward. Scandinavian Journal of Educational Research, 62(3), 354–367. https://doi.org/10.1080/00313831.2016.1261044.

    Article  Google Scholar 

  • Sandilands, D., Oliveri, M. E., Zumbo, B. D., & Ercikan, K. (2013). Investigating sources of differential item functioning in international large-scale assessments using a confirmatory approach. International Journal of Testing, 13(2), 152–174. https://doi.org/10.1080/15305058.2012.690140.

    Article  Google Scholar 

  • Schimmack, U., Oishi, S., & Diener, E. (2005). Individualism: a valid and important dimension of cultural differences between nations. Personality and Social Psychology Review, 9(1), 17–31. https://doi.org/10.1207/s15327957pspr0901_2.

    Article  Google Scholar 

  • Sookias, R. B., Passmore, S., & Atkinson, Q. D. (2018). Deep cultural ancestry and human development indicators across nation states. Royal Society Open Science, 5(4), 171411. https://doi.org/10.1098/rsos.171411.

    Article  Google Scholar 

  • Tang, Y., Zhang, W., Chen, K., Feng, S., Ji, Y., Shen, J., Reiman, E. M., & Liu, Y. (2006). Arithmetic processing in the brain shaped by cultures. Proceedings of the National Academy of Sciences of the United States of America, 103(28), 10775–10780. https://doi.org/10.1073/pnas.0604416103.

    Article  Google Scholar 

  • Tomasello, M. (1999a). The cultural origins of human cognition. Harvard University Press.

  • Tomasello, M. (1999b). The human adaptation for culture. Annual Review of Anthropology, 28, 509–529.

    Article  Google Scholar 

  • Tomasello, M. (2000). Culture and cognitive development. Current Directions in Psychological Science, 9(2), 37–40. https://doi.org/10.1111/1467-8721.00056.

    Article  Google Scholar 

  • Tomasello, M., Kruger, A. C., & Ratner, H. H. (1993). Cultural learning. Behavioral and Brain Sciences, 16, 495–552. https://doi.org/10.1017/s0140525x00031496.

    Article  Google Scholar 

  • Tucker, R. L. (1951). A method for synthesis of factor analysis studies. (personnel research section report no.984). Department of the army.

  • van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis of comparative research. Thousand Oaks: Sage.

    Google Scholar 

  • Van de Vijver, F. J. R., & Leung, K. (2011). Equivalence and bias: a review of concepts, models, and data analytic procedures. In Cross-cultural research methods in psychology. Cambridge University Press.

  • Varnum, M. E. W., Grossmann, I., Kitayama, S., & Nisbett, R. E. (2010). The origin of cultural differences in cognition: the social orientation hypothesis. Current Directions in Psychological Science, 19(1), 9–13. https://doi.org/10.1177/0963721409359301.

    Article  Google Scholar 

  • Verhelst, N. D. (2012). Profile analysis: a closer look at the PISA 2000 reading data. Scandinavian Journal of Educational Research, 56(3), 315–332. https://doi.org/10.1080/00313831.2011.583937.

    Article  Google Scholar 

  • Verhelst, N. D. (2017). Balance: A neglected aspect of reporting exam results. In M. Rosén, K.Y. Hansen, & U. Wolff (Eds.), Cognitive abilities and educational outcomes (pp. 273‐293). Springer International Publishing AG. https://doi.org/10.1007/978-3-319-43473-5

  • Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter logistic model. In Rasch models: Foundations, recent developments, and applications. Springer Verlag.

  • Verhelst, N.D., Verstralen, H.H.F.M., & Eggen, T.J.H.M. (1991). Finding starting values for the item parameters and suitable discrimination indices in the one parameter logistic model. Measurement and research department reports, 91-10, CITO.

  • Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1994). OPLM: computer program and manual. Cito.

  • Wu, M., Tam, H. P., & Jen, T. (2016). Educational measurement for applied researchers. Springer.

  • Yıldırım, H. H., & Berberoĝlu, G. (2009). Judgmental and statistical DIF analyses of the PISA-2003 mathematics literacy items. International Journal of Testing, 9(2), 108–121. https://doi.org/10.1080/15305050902880736.

    Article  Google Scholar 

Download references

Acknowledgements

I am grateful to Dr. Norman Verhelst for his thoughtful comments on the analyses. I would also like to thank Dr. Selda Yıldırım for her help in dealing with the TIMSS 2015 dataset and Fazıl Yıldırım for his help in creating the graphics. But they are not responsible for the ways in which I have analysed or represented the data.

Author information

Authors and Affiliations

Authors

Contributions

The author carried out the study, drafted, read and approved the final manuscript.

Corresponding author

Correspondence to Hüseyin H. Yıldırım.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Summary of the R commands used in the study to run the SMACOF package

Appendix. Summary of the R commands used in the study to run the SMACOF package

MATa <- read.csv("Math_a.csv", row.names=1, stringsAsFactors=FALSE)

MATb <- read.csv("Math_b.csv", row.names=1, stringsAsFactors=FALSE)

CULTURE<-read.csv("HDI.csv", row.names=1, stringsAsFactors=FALSE) ## Read matrix of pairwise correlations in item parameters between countries and read countries’ HDI indexdisMATa <- sim2diss(MATa, method="corr")

disMATb <- sim2diss(MATb, method="corr") ## convert correlations into dissimilarities

mdsMATa<-mds(disMATa, ndim = 2,type = "interval") ## run interval MDS and store the result in object mdsMATa.

mdsMATb<-mds(disMATb, ndim = 2,type = "interval", init = mdsMATa$conf) ## For Procrustean transformation, run MDS starting from the target configuration. fit<-Procrustes(mdsMATa$conf,mdsMATb$conf)

mdsMATb$conf<-fit$Yhat ## eliminate meaningless differences between configurations and generate Procrustean-fitted MDS solution.

HDIbiplot<-biplotmds(mdsMATa, extvar = CULTURE) ## map the variable HDI on the MDS configuration

plot (mdsMATa) ; plot (mdsMATb) ; plot (HDIbiplot) ## plot MDS configurations

permtest(mdsMATa,nrep=500);jackknife(mdsMATa) ## test for goodness-of-fit and stability of MDS configuration

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yıldırım, H.H. Re-reviewing item parameter equivalence in TIMSS 2015 from a sociocognitive perspective. Educ Asse Eval Acc 33, 27–48 (2021). https://doi.org/10.1007/s11092-020-09350-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11092-020-09350-8

Keywords

Navigation