Abstract
Traditionally, the task difficulty level is often determined by domain experts based on some hand-crafted rules. However, with the adoption of Massive Open Online Courses (MOOCs), it has become harder to manually personalize task difficulty as the system designers are faced with a very large question bank and a user base of individuals with diverse backgrounds and ability levels. This research focuses on developing a data-driven method to adaptively adjust difficulty levels in order to maintain a target user performance level over a series of tasks whose difficulty level is highly variable among different individuals. Specifically, the issue of difficulty adaptation was formulated as a reinforcement learning problem. To ensure responsiveness of the interactive systems, a novel bootstrapped policy gradient (BPG) framework was developed, which can incorporate prior knowledge of difficulty ranking into policy gradient to enhance sample efficiency. To obtain high-quality prior information on difficulty ranking, a clustering-based approach was proposed which can learn a personalized difficulty ranking to capture users’ individual differences. To evaluate the effectiveness of the difficulty adaptation method, we focused on a visual memory training problem with a large question bank and a diverse user base. Specifically, the proposed algorithms were combined and applied to a real-world application consisting of an online visual-spatial memory recall game and were shown to outperform the traditional rule-based adaptation approach in adapting to the slow players while achieving comparable performance in adapting to the fast players.
Similar content being viewed by others
Notes
In the case of \({{{\overset{\scriptscriptstyle \frown }{\pi }}}_{\theta }}({{a}_{i}})=0\), the gradient update is set to be zero by letting \({{{\overset{\scriptscriptstyle \frown }{\pi }}}_{\theta }}({{a}_{i}})\) to be equal to a constant.
To avoid a negative score, the minimum of the game score is set to zero.
The task posted on Mechanical Turk platform was open to the participants from all the countries with acceptance rates over 95%.
The target level used in this experiment was chosen by a preliminary study which employs a random selection method. The median and mean of memorization time lie in the range of the 4th time bubble, i.e., 4200–5200 ms
References
Alvarez, G.A., Cavanagh, P.: The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychol. sci. 15(2), 106–111 (2004)
Andrade, G., Ramalho, G., Santana, H., Corruble, V.: Challenge-sensitive action selection: an application to game balancing. In: IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IEEE, pp. 194–200 (2005)
Babcock, B., Weiss, D.: Termination criteria in computerized adaptive tests: Variable-length cats are not biased. In: Proceedings of the 2009 GMAC conference on computerized adaptive testing, vol. 14 (2009)
Bays, P.M., Husain, M.: Dynamic shifts of limited working memory resources in human vision. Science 321(5890), 851–854 (2008)
Booth, M.: The ai systems of left 4 dead. In: Artificial Intelligence and Interactive Digital Entertainment Conference at Stanford, 2009 (2009)
Brady, T.F., Konkle, T., Alvarez, G.A.: A review of visual memory capacity: Beyond individual items and toward structured representations. J. Vis. 11(5), 4 (2011)
Csikszentmihalyi, M.: Toward a psychology of optimal experience. In: Flow and the foundations of positive psychology, Springer, pp. 209–226 (2014)
Danzi, G., Santana, A.H.P., Furtado, A.W.B., Gouveia, A.R., Leitao, A., Ramalho, G.L.: Online adaptation of computer games agents: A reinforcement learning approach. In: II Workshop de Jogos e Entretenimento Digital, pp. 105–112 (2003)
Guzmán, E., Conejo, R.: A model for student knowledge diagnosis through adaptive testing. In: International Conference on Intelligent Tutoring Systems, Springer, pp. 12–21 (2004)
Guzmán, E., Conejo, R.: Self-assessment in a feasible, adaptive web-based testing system. IEEE Trans. Educ. 48(4), 688–695 (2005)
Guzman, E., Conejo, R., Perez-de-la Cruz, J.L.: Improving student performance using self-assessment tests. IEEE Intell. Syst. 22(4), 46–52 (2007)
Holmes, J., Gathercole, S.E., Dunning, D.L.: Adaptive training leads to sustained enhancement of poor working memory in children. Dev. Sci. 12(4), F9–F15 (2009)
Jennings-Teats, M., Smith, G., Wardrip-Fruin, N.: Polymorph: dynamic difficulty adjustment through level generation. In: Proceedings of the 2010 Workshop on Procedural Content Generation in Games, ACM, p. 11 (2010)
Klingberg, T., Fernell, E., Olesen, P.J., Johnson, M., Gustafsson, P., Dahlström, K., Gillberg, C.G., Forssberg, H., Westerberg, H.: Computerized training of working memory in children with adhd-a randomized, controlled trial. J. Am. Acad. Child & Adolesc. Psychiatr. 44(2), 177–186 (2005)
Lan, A.S., Baraniuk, R.G.: A contextual bandits framework for personalized learning action selection. In: EDM, pp. 424–429 (2016)
Van der Linden, W.J., Glas, C.A., et al.: Computerized adaptive testing: Theory and practice. Springer (2000)
Liu, C., Agrawal, P., Sarkar, N., Chen, S.: Dynamic difficulty adjustment in computer games through real-time anxiety-based affective feedback. Int. J. Human-Comput. Interact. 25(6), 506–529 (2009)
Luck, S.J., Vogel, E.K.: The capacity of visual working memory for features and conjunctions. Nature 390(6657), 279 (1997)
Okpo, J., Masthoff, J., Dennis, M., Beacham, N.: Conceptualizing a framework for adaptive exercise selection with personality as a major learner characteristic. In: Adjunct publication of the 25th conference on user modeling, adaptation and personalization, pp. 293–298 (2017)
Okpo, J., Masthoff, J., Dennis, M., Beacham, N., Ciocarlan, A.: Investigating the impact of personality and cognitive efficiency on the selection of exercises for learners. In: Proceedings of the 25th conference on user modeling, adaptation and personalization, pp. 140–147 (2017)
Olesen, P.J., Westerberg, H., Klingberg, T.: Increased prefrontal and parietal activity after training of working memory. Nat. Neurosci. 7(1), 75 (2004)
Papoušek, J., Pelánek, R.: Impact of adaptive educational system behaviour on student motivation. In: International Conference on Artificial Intelligence in Education, Springer, pp. 348–357 (2015)
Papoušek, J., Stanislav, V., Pelánek, R.: Impact of question difficulty on engagement and learning. In: International Conference on Intelligent Tutoring Systems, Springer, pp. 267–272 (2016)
Rapport, M.D., Orban, S.A., Kofler, M.J., Friedman, L.M.: Do programs designed to train working memory, other executive functions, and attention benefit children with adhd? a meta-analytic review of cognitive, academic, and behavioral outcomes. Clin. Psychol. Rev. 33(8), 1237–1252 (2013)
Sampayo-Vargas, S., Cope, C.J., He, Z., Byrne, G.J.: The effectiveness of adaptive difficulty adjustments on students’ motivation and learning in an educational computer game. Comput. & Educ. 69, 452–462 (2013)
Segal, A., David, Y.B., Williams, J.J., Gal, K., Shalom, Y.: Combining difficulty ranking with multi-armed bandits to sequence educational content. In: International Conference on Artificial Intelligence in Education, Springer, pp. 317–321 (2018)
Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Recommender systems handbook, Springer, pp. 257–297 (2011)
Shani, G., Shapira, B.: Edurank: A collaborative filtering approach to personalization in e-learning. Educational data mining pp. 68–75 (2014)
Swanson, H.L.: Working memory, attention, and mathematical problem solving: A longitudinal study of elementary school children. J. Educ. Psychol. 103(4), 821 (2011)
Togelius, J., De Nardi, R., Lucas, S.M.: Towards automatic personalised content creation for racing games. In: 2007 IEEE Symposium on Computational Intelligence and Games, IEEE, pp. 252–259 (2007)
Vogel, E.K., Machizawa, M.G.: Neural activity predicts individual differences in visual working memory capacity. Nature 428(6984), 748 (2004)
Vygotsky, L.: Interaction between learning and development. Read. Develop. Child. 23(3), 34–41 (1978)
Wauters, K., Desmet, P., Van Den Noortgate, W.: Adaptive item-based learning environments based on the item response theory: Possibilities and challenges. J. Comput. Ass. Learn. 26(6), 549–562 (2010)
Xu, Y., Chun, M.M.: Visual grouping in human parietal cortex. Proc. Natl. Acad. Sci. 104(47), 18766–18771 (2007)
Yao, Y.: Measuring retrieval effectiveness based on user preference of documents. J. Am. Soc. Inf. Sci. 46(2), 133–145 (1995)
Zhang, Y., Goh, W.B.: Bootstrapped policy gradient for difficulty adaptation in intelligent tutoring systems. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 711–719 (2019)
Zhang, Y., Mańdziuk, J., Quek, C.H., Goh, B.W.: Curvature-based method for determining the number of clusters. Inform. Sci. 415, 414–428 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Visual Memory Tasks
Appendix: Visual Memory Tasks
Rights and permissions
About this article
Cite this article
Zhang, Y., Goh, WB. Personalized task difficulty adaptation based on reinforcement learning. User Model User-Adap Inter 31, 753–784 (2021). https://doi.org/10.1007/s11257-021-09292-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11257-021-09292-w