Re-reviewing item parameter equivalence in TIMSS 2015 from a sociocognitive perspective

Yıldırım, Hüseyin H.

doi:10.1007/s11092-020-09350-8

Re-reviewing item parameter equivalence in TIMSS 2015 from a sociocognitive perspective

Published: 06 January 2021

Volume 33, pages 27–48, (2021)
Cite this article

Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Hüseyin H. Yıldırım ORCID: orcid.org/0000-0001-9924-8281¹

363 Accesses
2 Altmetric
Explore all metrics

Abstract

From a sociocognitive perspective, item parameters in a test represent regularities in examinees’ item responses. These regularities are originated from shared experiences among individuals in interacting with their environment. Theories explaining the relationship between culture and cognition also acknowledge these shared experiences as the source of human cognition. In this context, this study argues that if human cognition is a cultural phenomenon and not everywhere the same, then item parameters in cross-cultural surveys may inevitably fluctuate across culturally different populations. The investigation of item parameter equivalence in TIMSS 2015 supports this argument. The multidimensional scaling representation of similarity in the item parameters across countries in TIMSS 2015 shows that the item parameters are more similar within Arab, Western, East Asian and post-Soviet country clusters and are remarkably less similar between these clusters. Similar fluctuation structure across countries in discrimination and difficulty parameters in mathematics and science subjects point to the vital role of cultural differences in item parameter nonequivalence in cross-cultural surveys. The study concludes that it is very difficult for cross-cultural surveys to achieve the highest level of measurement invariance that guarantees meaningful scale score comparisons across countries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

The Relation Between Family Socioeconomic Status and Academic Achievement in China: A Meta-analysis

Article 01 August 2019

Students’ Well-Being and Academic Achievement: A Meta-Analysis Study

Article 06 April 2021

Data availability

The data sets analysed in this present study are available in the TIMSS 2015 international database, https://timssandpirls.bc.edu/timss2015/international-database/

References

Bandura, A. (1977). Social learning theory. Prentice-Hall.
Bleakley, A., Bligh, J., & Browne, J. (2011). Medical education for the future: identity, power and location. Springer.
Borg, I., Groenen, P. J. F., & Mair, P. (2018). Applied multidimensional scaling and unfolding. Springer.
Byrne, B. M., & van de Vijver, F. J. R. (2010). Testing for measurement and structural equivalence in large-scale cross-cultural studies: addressing the issue of nonequivalence. International Journal of Testing, 10(2), 107–132. https://doi.org/10.1080/15305051003637306.
Article Google Scholar
Cantlon, J. F., & Brannon, E. M. (2007). Adding up the effects of cultural experience on the brain. Trends in Cognitive Sciences, 11(1), 1–4. https://doi.org/10.1016/j.tics.2006.10.008.
Article Google Scholar
Cole, M. (2005). Putting culture in the middle. In H. Daniels (Ed.), An Introduction to Vygotsky (pp. 195–222). Routledge.
Cole, M., John-Steiner, V., Scribner, S., & Souberman, E. (Eds.). (1978). L.S. Vygotsky: mind in society: the development of higher psychological processes. Harvard University Press.
Davidov, E., Dülmer, H., Schlüter, E., Schmidt, P., & Meuleman, B. (2012). Using a multilevel structural equation modeling approach to explain cross-cultural measurement noninvariance. Journal of Cross-Cultural Psychology, 43(4), 558–575. https://doi.org/10.1177/0022022112438397.
Article Google Scholar
Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross-national research. Annual Review of Sociology, 40(1), 55–75. https://doi.org/10.1146/annurev-soc-071913-043137.
Article Google Scholar
de Leeuw, J., & Mair, P. (2009). Multidimensional scaling using majorization: SMACOF in R. Journal of Statistical Software, 31(3), 1–30. https://doi.org/10.18637/jss.v031.i03.
Article Google Scholar
de Leeuw, J., & Meulman, J. (1986). A special jackknife for multidimensional scaling. Journal of Classification, 3(1), 97–112. https://doi.org/10.1007/BF01896814.
Article Google Scholar
Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3–4), 199–215. https://doi.org/10.1080/15305058.2002.9669493.
Article Google Scholar
Fernández, I., Carrera, P., Sánchez, F., Paez, D., & Candia, L. (2000). Differences between cultures in emotional verbal and non-verbal reactions. Psicothema, 12(SUPPL. 1), 83–92.
Google Scholar
Foy, P., Martin, M. O., Mullis, I. V. S., Yin, L., Centurino, V. A. S., & Reynolds, K. A. (2016). Reviewing the TIMSS 2015 achievement item statistics. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and procedures in TIMSS 2015 IEA.
Google Scholar
Glöckner-Rist, A., & Hoijtink, H. (2003). The best of both worlds: factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544–565. https://doi.org/10.1207/S15328007SEM1004_4.
Article Google Scholar
Graffelman, J. (2013). Calibrate: calibration of scatterplot and biplot axes. R package version, 1(7), 2 http://CRAN.R-project.org/package=calibrate.
Google Scholar
Greenfield, P. M., Maynard, A. E., & Childs, C. P. (2003). Historical change, cultural learning, and cognitive representation in Zinacantec Maya children. In Cognitive Development, 18, 455–487. https://doi.org/10.1016/j.cogdev.2003.09.004.
Article Google Scholar
Grisay, A., & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33(1), 69–86. https://doi.org/10.1016/j.stueduc.2007.01.006.
Article Google Scholar
He, J., Barrera-Pedemonte, F., & Buchholz, J. (2018). Cross-cultural comparability of noncognitive constructs in TIMSS and PISA (pp. 1–17). Principles, Policy and Practice: Assessment in Education. https://doi.org/10.1080/0969594X.2018.1469467.
Book Google Scholar
Heine, S. J., Lehman, D. R., Ide, E., Leung, C., Kitayama, S., Takata, T., & Matsumoto, H. (2001). Divergent consequences of success and failure in Japan and North America: an investigation of self-improving motivations and malleable selves. Journal of Personality and Social Psychology, 81(4), 599–615. https://doi.org/10.1037/0022-3514.81.4.599.
Article Google Scholar
Kreiner, S., & Christensen, K. B. (2013). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210–231. https://doi.org/10.1007/s11336-013-9347-z.
Article Google Scholar
Lorenzo-Seva, U., & ten Berge, J. M. F. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology, 2(2), 57–64. https://doi.org/10.1027/1614-2241.2.2.57.
Article Google Scholar
Lubienski, S. T. (2000). Problem solving as a means toward mathematics for all: an exploratory look through a class lens. Journal for Research in Mathematics Education, 31(4), 454–482. https://doi.org/10.2307/749653.
Article Google Scholar
Lubienski, S. T. (2007). What we can do about achievement disparities. Educational Leadership, 65(3), 54–59.
Google Scholar
Mair, P., Borg, I., & Rusch, T. (2016). Goodness-of-fit assessment in multidimensional scaling and unfolding. Multivariate Behavioral Research, 51(6), 772–789. https://doi.org/10.1080/00273171.2016.1235966.
Article Google Scholar
Mair, P., Groenen, P. J. F., & de Leeuw, J. (2019). More on multidimensional scaling and unfolding in R: Smacof version 2. Retrieved from https://cran.pau.edu.tr/web/packages/smacof/vignettes/smacof.pdf
Martin, M., Mullis, I. V. S., & Foy, P. (2013). TIMSS 2015 assessment design. In I. V. S. Mullis & M. O. Martin (Eds.), Timss 2015 assessment frameworks (pp. 85–98). Chestnut Hill: TIMSS & PIRLS International Study Center, Boston College.
Google Scholar
Martin, M., Mullis, I. V. S., & Hooper, M. (2016). Methods and procedures in Timss 2015. Chestnut Hill: TIMSS & PIRLS International Study Center, Boston College.
Google Scholar
Mislevy, R. J. (2018). Sociocognitive foundations of educational measurement. New York: Routledge.
Book Google Scholar
Mullis, I. V. S., & Martin, M. O. (2013). TIMSS 2015 assessment frameworks. Chestnut Hill: TIMSS & PIRLS International Study Center,Boston College.
Google Scholar
Nisbett, R. E. (2005). The geography of thought. Nicholas Brealey Publishing.
Nisbett, R. E., Choi, I., Peng, K., & Norenzayan, A. (2001). Culture and systems of thought: holistic versus analytic cognition. Psychological Review, 108(2), 291–310. https://doi.org/10.1037/0033-295X.108.2.291.
Article Google Scholar
Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333 Retrieved from http://www.psychologie-aktuell.com/fileadmin/download/ptam/3-2011_20110927/04_Oliveri.pdf.
Google Scholar
Oliveri, M. E., & von Davier, M. (2014). Toward increasing fairness in score scale calibrations employed in international large-scale assessments. International Journal of Testing, 14(1), 1–21. https://doi.org/10.1080/15305058.2013.825265.
Article Google Scholar
Punter, R. A. (2018). Improving the modelling of response variation in international large-scale assessments. [unpublished doctoral dissertation]. University of Twente.
Rutkowski, L., & Rutkowski, D. (2018). Improving the comparability and local usefulness of international assessments: a look back and a way forward. Scandinavian Journal of Educational Research, 62(3), 354–367. https://doi.org/10.1080/00313831.2016.1261044.
Article Google Scholar
Sandilands, D., Oliveri, M. E., Zumbo, B. D., & Ercikan, K. (2013). Investigating sources of differential item functioning in international large-scale assessments using a confirmatory approach. International Journal of Testing, 13(2), 152–174. https://doi.org/10.1080/15305058.2012.690140.
Article Google Scholar
Schimmack, U., Oishi, S., & Diener, E. (2005). Individualism: a valid and important dimension of cultural differences between nations. Personality and Social Psychology Review, 9(1), 17–31. https://doi.org/10.1207/s15327957pspr0901_2.
Article Google Scholar
Sookias, R. B., Passmore, S., & Atkinson, Q. D. (2018). Deep cultural ancestry and human development indicators across nation states. Royal Society Open Science, 5(4), 171411. https://doi.org/10.1098/rsos.171411.
Article Google Scholar
Tang, Y., Zhang, W., Chen, K., Feng, S., Ji, Y., Shen, J., Reiman, E. M., & Liu, Y. (2006). Arithmetic processing in the brain shaped by cultures. Proceedings of the National Academy of Sciences of the United States of America, 103(28), 10775–10780. https://doi.org/10.1073/pnas.0604416103.
Article Google Scholar
Tomasello, M. (1999a). The cultural origins of human cognition. Harvard University Press.
Tomasello, M. (1999b). The human adaptation for culture. Annual Review of Anthropology, 28, 509–529.
Article Google Scholar
Tomasello, M. (2000). Culture and cognitive development. Current Directions in Psychological Science, 9(2), 37–40. https://doi.org/10.1111/1467-8721.00056.
Article Google Scholar
Tomasello, M., Kruger, A. C., & Ratner, H. H. (1993). Cultural learning. Behavioral and Brain Sciences, 16, 495–552. https://doi.org/10.1017/s0140525x00031496.
Article Google Scholar
Tucker, R. L. (1951). A method for synthesis of factor analysis studies. (personnel research section report no.984). Department of the army.
van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis of comparative research. Thousand Oaks: Sage.
Google Scholar
Van de Vijver, F. J. R., & Leung, K. (2011). Equivalence and bias: a review of concepts, models, and data analytic procedures. In Cross-cultural research methods in psychology. Cambridge University Press.
Varnum, M. E. W., Grossmann, I., Kitayama, S., & Nisbett, R. E. (2010). The origin of cultural differences in cognition: the social orientation hypothesis. Current Directions in Psychological Science, 19(1), 9–13. https://doi.org/10.1177/0963721409359301.
Article Google Scholar
Verhelst, N. D. (2012). Profile analysis: a closer look at the PISA 2000 reading data. Scandinavian Journal of Educational Research, 56(3), 315–332. https://doi.org/10.1080/00313831.2011.583937.
Article Google Scholar
Verhelst, N. D. (2017). Balance: A neglected aspect of reporting exam results. In M. Rosén, K.Y. Hansen, & U. Wolff (Eds.), Cognitive abilities and educational outcomes (pp. 273‐293). Springer International Publishing AG. https://doi.org/10.1007/978-3-319-43473-5
Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter logistic model. In Rasch models: Foundations, recent developments, and applications. Springer Verlag.
Verhelst, N.D., Verstralen, H.H.F.M., & Eggen, T.J.H.M. (1991). Finding starting values for the item parameters and suitable discrimination indices in the one parameter logistic model. Measurement and research department reports, 91-10, CITO.
Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1994). OPLM: computer program and manual. Cito.
Wu, M., Tam, H. P., & Jen, T. (2016). Educational measurement for applied researchers. Springer.
Yıldırım, H. H., & Berberoĝlu, G. (2009). Judgmental and statistical DIF analyses of the PISA-2003 mathematics literacy items. International Journal of Testing, 9(2), 108–121. https://doi.org/10.1080/15305050902880736.
Article Google Scholar

Download references

Acknowledgements

I am grateful to Dr. Norman Verhelst for his thoughtful comments on the analyses. I would also like to thank Dr. Selda Yıldırım for her help in dealing with the TIMSS 2015 dataset and Fazıl Yıldırım for his help in creating the graphics. But they are not responsible for the ways in which I have analysed or represented the data.

Author information

Authors and Affiliations

Mathematics Education Department, Faculty of Education, Bolu Abant Izzet Baysal University, 14030, Bolu, Turkey
Hüseyin H. Yıldırım

Authors

Hüseyin H. Yıldırım
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The author carried out the study, drafted, read and approved the final manuscript.

Corresponding author

Correspondence to Hüseyin H. Yıldırım.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Summary of the R commands used in the study to run the SMACOF package

MATa <- read.csv("Math_a.csv", row.names=1, stringsAsFactors=FALSE)

MATb <- read.csv("Math_b.csv", row.names=1, stringsAsFactors=FALSE)

CULTURE<-read.csv("HDI.csv", row.names=1, stringsAsFactors=FALSE) ## Read matrix of pairwise correlations in item parameters between countries and read countries’ HDI indexdisMATa <- sim2diss(MATa, method="corr")

disMATb <- sim2diss(MATb, method="corr") ## convert correlations into dissimilarities

mdsMATa<-mds(disMATa, ndim = 2,type = "interval") ## run interval MDS and store the result in object mdsMATa.

mdsMATb<-mds(disMATb, ndim = 2,type = "interval", init = mdsMATa$conf) ## For Procrustean transformation, run MDS starting from the target configuration. fit<-Procrustes(mdsMATa$conf,mdsMATb$conf)

mdsMATb$conf<-fit$Yhat ## eliminate meaningless differences between configurations and generate Procrustean-fitted MDS solution.

HDIbiplot<-biplotmds(mdsMATa, extvar = CULTURE) ## map the variable HDI on the MDS configuration

plot (mdsMATa) ; plot (mdsMATb) ; plot (HDIbiplot) ## plot MDS configurations

permtest(mdsMATa,nrep=500);jackknife(mdsMATa) ## test for goodness-of-fit and stability of MDS configuration

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yıldırım, H.H. Re-reviewing item parameter equivalence in TIMSS 2015 from a sociocognitive perspective. Educ Asse Eval Acc 33, 27–48 (2021). https://doi.org/10.1007/s11092-020-09350-8

Download citation

Received: 17 March 2020
Accepted: 09 December 2020
Published: 06 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11092-020-09350-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Re-reviewing item parameter equivalence in TIMSS 2015 from a sociocognitive perspective

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Relation Between Family Socioeconomic Status and Academic Achievement in China: A Meta-analysis

Students’ Well-Being and Academic Achievement: A Meta-Analysis Study

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendix. Summary of the R commands used in the study to run the SMACOF package

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Re-reviewing item parameter equivalence in TIMSS 2015 from a sociocognitive perspective

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Relation Between Family Socioeconomic Status and Academic Achievement in China: A Meta-analysis

Students’ Well-Being and Academic Achievement: A Meta-Analysis Study

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendix. Summary of the R commands used in the study to run the SMACOF package

Appendix. Summary of the R commands used in the study to run the SMACOF package

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation