Research Article
BibTex RIS Cite

Developing an Item Bank for Progress Tests and Application of Computerized Adaptive Testing by Simulation in Medical Education

Year 2019, Volume: 6 Issue: 4, 656 - 669, 05.01.2020
https://doi.org/10.21449/ijate.635675

Abstract

Progress Test (PT) is a form of assessment that simultaneously measures ability levels of all students in a certain educational program and their progress over time by providing them with same questions and repeating the process at regular intervals with parallel tests. Our objective was to generate an item bank for the PT and to examine the possible fit of CAT for PT application. This study is a descriptive study. 1206 medical students participated. During the analysis of the psychometric properties of PT item bank, “the Rasch model for dichotomous items was used”. Several CAT simulations were performed by applying various stopping rules of different standard errors. CAT simulation estimates were compared with the estimates generated from the original calibration of the Rasch model where all items were included. After Rasch analysis, a unidimensional PT item bank consisting of 103 items was obtained. The item bank reliability was calculated as 0.77 with Person Separation Index (PSI) and Kuder-Richardson Formula 20 (KR-20). A high correlation between θ estimations obtained from paper-and-pencil (θRM) and CAT applications (θCAT) was detected for simulation conditions ([N(0,1)] and [N(0,3)]) at the end of our analysis. In CAT, estimation can be made with an average of 14 questions (reduced 86,4%) and 17 questions (reduced 83,4%) [for N(0,1) and [N(0,3) respectively] with reliability of 0,75. This study reveals that it is possible to develop an appropriate item bank for the PT, and the difficulty of administering large number of items in PT can be scaled down by incorporating CAT application.

References

  • Abberger, B., Haschke, A., Wirtz, M., Kroehne, U., Bengel, J., & Baumeister, H. (2013). Development and evaluation of a computer adaptive test to assess anxiety in cardiovascular rehabilitation patients. Archives of Physical Medicine and Rehabilitation, 94(12), 2433-2439. Doi: 10.1016/j.apmr.2013.07.009
  • Andrich, D. (1988). Rasch models for measurement. The USA: Sage Publications Inc.
  • Andrich D, Lyne A, Sheridan B, Luo G. RUMM2020. Perth: RUMM Laboratory Pty Ltd. 2003 Freeman, A., Van Der Vleuten, C., Nouns, Z., & Ricketts, C. (2010). Progress testing internationally. Medical Teacher, 32(6), 451-455. Doi: 10.3109/0142159X.2010.485231
  • Andrich, D., & Hagquist, C. (2012). Real and artificial differential item functioning. Journal of Educational and Behavioral Statistics, 37(3), 387-416. Doi: 10.3102/1076998611411913
  • Bland, J. M., & Altman, D. G. (1995). Multiple significance tests: The Bonferroni method. BMJ, 310(6973), 170. Doi: 10.1136/bmj.310.6973.170
  • Bland, J. M., & Altman, D. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327(8476), 307-310. Doi: 10.1016/S0140-6736(86)90837-8
  • Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135 160. Doi: 10.1177/096228029900800204
  • Bjorner, J. B., Chang, C. H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research, 16(1), 95-108. Doi: 10.1007/s11136-007-9168-6
  • Elhan A. H., Küçükdeveci A. A., & Tennant A. (2010). The rasch measurement model. Franchignoni F. (Ed.) Research issues in Physical & Rehabilitation Medicine. Advances in Rehabilitation. Maugeri Foundation 19, 89-102
  • Fisher, W. P. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238.
  • Freeman, A., Van Der Vleuten, C., Nouns, Z., & Ricketts, C. (2010) Progress testing internationally. Medical Teacher. 2010. 32(6), 451 455. Doi: 10.3109/0142159X.2010.485231
  • Hambleton, R. K. (1991). Fundamentals of item response theory. The USA: Sage publications.
  • Linacre, J. M. (2000). Computer adaptive testing: A methodology whose time has come. Chae, S.-Kang, U. Jeon, E. Linacre, JM (eds.): Development of Computerised Middle School Achievement Tests, MESA Research Memorandum.
  • Marais, I., & Andrich, D. (2007). RUMMss. Rasch unidimensional measurement models simulation studies software. The University of Western Australia, Perth.
  • Nunnally, J., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill.
  • Öztuna, D. (2008). Kas iskelet sistem sorunlarının özürlülük değerlendiriminde bilgisayar uyarlamalı test yönteminin uygulanması (Implementing computer adaptive testing method to estimate disability levels in musculoskeletal system disorders). (Doctoral Dissertation). Ankara Üniversitesi Sağlık Bilimleri Enstitüsü. Ankara
  • Öztuna D. (2012). A computerized adaptive testing software (CAT): SmartCAT. European Rasch Training Group (ERTG) Meeting, 17-19 April 2012, Leeds, UK.
  • Pallant, J. F., & Tennant, A. (2007). An introduction to the rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46(1), 1-18. Doi: 10.1348/014466506X96931
  • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420. Doi: 10.1037/0033-2909.86.2.420
  • Tavakol, M., & Dennick, R. (2012). Post-examination interpretation of objective test data: Monitoring and improving the quality of high-stakes examinations: AMEE Guide No. 66. Medical Teacher, 34(3), e161-e175. Doi: 10.3109/0142159X.2012.651178
  • Tennant, A., & Pallant, J. (2006). Unidimensionality matters! (A Tale of Two Smiths?). Rasch Measurement Transactions, 20(1), 1048-1051.
  • Tennant, A., & Conaghan, P. G. (2007). The rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care & Research, 57(8), 1358-1362. Doi: 10.1002/art.23108
  • Teresi, J. A., Kleinman, M., & Ocepek‐Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures. Statistics in Medicine, 19(11‐12), 1651-1683. Doi: 10.1002/(SICI)1097-0258(20000615/30)19:11/12<1651: AID-SIM453>3.0.CO;2-H
  • Wainer, H., Dorans, N., Eignor, D., Flaugher, R., Green, B., Mislevy, R., & Steinberg, L. (2001). Computerized adaptive testing: A primer. Qual Life Res, 10, 733-734. Doi: 10.1023/A:1016834001219
  • Wright, B. D., & Bell, S. R. (1984). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345. Doi: 10.1111/j.1745-3984.1984.tb01038.x
  • Wrigley, W., Van Der Vleuten, C. P., Freeman, A., & Muijtjens, A. (2012). A systemic framework for the progress test: strengths, constraints and issues: AMEE Guide No. 71. Medical Teacher, 34(9), 683-697. Doi: 10.3109/0142159X.2012.704437

Developing an Item Bank for Progress Tests and Application of Computerized Adaptive Testing by Simulation in Medical Education

Year 2019, Volume: 6 Issue: 4, 656 - 669, 05.01.2020
https://doi.org/10.21449/ijate.635675

Abstract











Progress
Test (PT) is a form of assessment that simultaneously measures ability levels
of all students in a certain educational program and their progress over time
by providing them with same questions and repeating the process at regular
intervals with parallel tests. Our objective was to generate an item bank for
the PT and to examine the possible fit of CAT for PT application. This study is
a descriptive study. 1206 medical students participated. During the analysis of
the psychometric properties of PT item bank, “the Rasch model for dichotomous
items was used”. Several CAT simulations were performed by applying various
stopping rules of different standard errors. CAT simulation estimates were
compared with the estimates generated from the original calibration of the
Rasch model where all items were included. After Rasch analysis, a
unidimensional PT item bank consisting of 103 items was obtained. The item bank
reliability was calculated as 0.77 with Person Separation Index (PSI) and
Kuder-Richardson Formula 20 (KR-20). A high correlation between θ estimations
obtained from paper-and-pencil (θRM) and CAT applications (θCAT) was
detected for simulation conditions ([N(0,1)] and [N(0,3)]) at the end of our
analysis. In CAT, estimation can be made with an average of 14 questions
(reduced 86,4%) and 17 questions (reduced 83,4%) [for N(0,1) and [N(0,3)
respectively] with reliability of 0,75. This study reveals that it is possible
to develop an appropriate item bank for the PT, and the difficulty of
administering large number of items in PT can be scaled down by incorporating
CAT application.

References

  • Abberger, B., Haschke, A., Wirtz, M., Kroehne, U., Bengel, J., & Baumeister, H. (2013). Development and evaluation of a computer adaptive test to assess anxiety in cardiovascular rehabilitation patients. Archives of Physical Medicine and Rehabilitation, 94(12), 2433-2439. Doi: 10.1016/j.apmr.2013.07.009
  • Andrich, D. (1988). Rasch models for measurement. The USA: Sage Publications Inc.
  • Andrich D, Lyne A, Sheridan B, Luo G. RUMM2020. Perth: RUMM Laboratory Pty Ltd. 2003 Freeman, A., Van Der Vleuten, C., Nouns, Z., & Ricketts, C. (2010). Progress testing internationally. Medical Teacher, 32(6), 451-455. Doi: 10.3109/0142159X.2010.485231
  • Andrich, D., & Hagquist, C. (2012). Real and artificial differential item functioning. Journal of Educational and Behavioral Statistics, 37(3), 387-416. Doi: 10.3102/1076998611411913
  • Bland, J. M., & Altman, D. G. (1995). Multiple significance tests: The Bonferroni method. BMJ, 310(6973), 170. Doi: 10.1136/bmj.310.6973.170
  • Bland, J. M., & Altman, D. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327(8476), 307-310. Doi: 10.1016/S0140-6736(86)90837-8
  • Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135 160. Doi: 10.1177/096228029900800204
  • Bjorner, J. B., Chang, C. H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research, 16(1), 95-108. Doi: 10.1007/s11136-007-9168-6
  • Elhan A. H., Küçükdeveci A. A., & Tennant A. (2010). The rasch measurement model. Franchignoni F. (Ed.) Research issues in Physical & Rehabilitation Medicine. Advances in Rehabilitation. Maugeri Foundation 19, 89-102
  • Fisher, W. P. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238.
  • Freeman, A., Van Der Vleuten, C., Nouns, Z., & Ricketts, C. (2010) Progress testing internationally. Medical Teacher. 2010. 32(6), 451 455. Doi: 10.3109/0142159X.2010.485231
  • Hambleton, R. K. (1991). Fundamentals of item response theory. The USA: Sage publications.
  • Linacre, J. M. (2000). Computer adaptive testing: A methodology whose time has come. Chae, S.-Kang, U. Jeon, E. Linacre, JM (eds.): Development of Computerised Middle School Achievement Tests, MESA Research Memorandum.
  • Marais, I., & Andrich, D. (2007). RUMMss. Rasch unidimensional measurement models simulation studies software. The University of Western Australia, Perth.
  • Nunnally, J., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill.
  • Öztuna, D. (2008). Kas iskelet sistem sorunlarının özürlülük değerlendiriminde bilgisayar uyarlamalı test yönteminin uygulanması (Implementing computer adaptive testing method to estimate disability levels in musculoskeletal system disorders). (Doctoral Dissertation). Ankara Üniversitesi Sağlık Bilimleri Enstitüsü. Ankara
  • Öztuna D. (2012). A computerized adaptive testing software (CAT): SmartCAT. European Rasch Training Group (ERTG) Meeting, 17-19 April 2012, Leeds, UK.
  • Pallant, J. F., & Tennant, A. (2007). An introduction to the rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46(1), 1-18. Doi: 10.1348/014466506X96931
  • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420. Doi: 10.1037/0033-2909.86.2.420
  • Tavakol, M., & Dennick, R. (2012). Post-examination interpretation of objective test data: Monitoring and improving the quality of high-stakes examinations: AMEE Guide No. 66. Medical Teacher, 34(3), e161-e175. Doi: 10.3109/0142159X.2012.651178
  • Tennant, A., & Pallant, J. (2006). Unidimensionality matters! (A Tale of Two Smiths?). Rasch Measurement Transactions, 20(1), 1048-1051.
  • Tennant, A., & Conaghan, P. G. (2007). The rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care & Research, 57(8), 1358-1362. Doi: 10.1002/art.23108
  • Teresi, J. A., Kleinman, M., & Ocepek‐Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures. Statistics in Medicine, 19(11‐12), 1651-1683. Doi: 10.1002/(SICI)1097-0258(20000615/30)19:11/12<1651: AID-SIM453>3.0.CO;2-H
  • Wainer, H., Dorans, N., Eignor, D., Flaugher, R., Green, B., Mislevy, R., & Steinberg, L. (2001). Computerized adaptive testing: A primer. Qual Life Res, 10, 733-734. Doi: 10.1023/A:1016834001219
  • Wright, B. D., & Bell, S. R. (1984). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345. Doi: 10.1111/j.1745-3984.1984.tb01038.x
  • Wrigley, W., Van Der Vleuten, C. P., Freeman, A., & Muijtjens, A. (2012). A systemic framework for the progress test: strengths, constraints and issues: AMEE Guide No. 71. Medical Teacher, 34(9), 683-697. Doi: 10.3109/0142159X.2012.704437
There are 26 citations in total.

Details

Primary Language English
Subjects Studies on Education
Journal Section Articles
Authors

Ayşen Melek Aytuğ Koşan 0000-0001-5298-2032

Nizamettin Koç This is me 0000-0002-3308-7849

Atilla Halil Elhan 0000-0003-3324-248X

Derya Öztuna This is me 0000-0001-6266-3035

Publication Date January 5, 2020
Submission Date October 21, 2019
Published in Issue Year 2019 Volume: 6 Issue: 4

Cite

APA Aytuğ Koşan, A. M., Koç, N., Elhan, A. H., Öztuna, D. (2020). Developing an Item Bank for Progress Tests and Application of Computerized Adaptive Testing by Simulation in Medical Education. International Journal of Assessment Tools in Education, 6(4), 656-669. https://doi.org/10.21449/ijate.635675

23824         23823             23825