Skip to main content
Log in

Personalized task difficulty adaptation based on reinforcement learning

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

Traditionally, the task difficulty level is often determined by domain experts based on some hand-crafted rules. However, with the adoption of Massive Open Online Courses (MOOCs), it has become harder to manually personalize task difficulty as the system designers are faced with a very large question bank and a user base of individuals with diverse backgrounds and ability levels. This research focuses on developing a data-driven method to adaptively adjust difficulty levels in order to maintain a target user performance level over a series of tasks whose difficulty level is highly variable among different individuals. Specifically, the issue of difficulty adaptation was formulated as a reinforcement learning problem. To ensure responsiveness of the interactive systems, a novel bootstrapped policy gradient (BPG) framework was developed, which can incorporate prior knowledge of difficulty ranking into policy gradient to enhance sample efficiency. To obtain high-quality prior information on difficulty ranking, a clustering-based approach was proposed which can learn a personalized difficulty ranking to capture users’ individual differences. To evaluate the effectiveness of the difficulty adaptation method, we focused on a visual memory training problem with a large question bank and a diverse user base. Specifically, the proposed algorithms were combined and applied to a real-world application consisting of an online visual-spatial memory recall game and were shown to outperform the traditional rule-based adaptation approach in adapting to the slow players while achieving comparable performance in adapting to the fast players.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In the case of \({{{\overset{\scriptscriptstyle \frown }{\pi }}}_{\theta }}({{a}_{i}})=0\), the gradient update is set to be zero by letting \({{{\overset{\scriptscriptstyle \frown }{\pi }}}_{\theta }}({{a}_{i}})\) to be equal to a constant.

  2. To avoid a negative score, the minimum of the game score is set to zero.

  3. The task posted on Mechanical Turk platform was open to the participants from all the countries with acceptance rates over 95%.

  4. The target level used in this experiment was chosen by a preliminary study which employs a random selection method. The median and mean of memorization time lie in the range of the 4th time bubble, i.e., 4200–5200 ms

References

  • Alvarez, G.A., Cavanagh, P.: The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychol. sci. 15(2), 106–111 (2004)

    Article  Google Scholar 

  • Andrade, G., Ramalho, G., Santana, H., Corruble, V.: Challenge-sensitive action selection: an application to game balancing. In: IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IEEE, pp. 194–200 (2005)

  • Babcock, B., Weiss, D.: Termination criteria in computerized adaptive tests: Variable-length cats are not biased. In: Proceedings of the 2009 GMAC conference on computerized adaptive testing, vol. 14 (2009)

  • Bays, P.M., Husain, M.: Dynamic shifts of limited working memory resources in human vision. Science 321(5890), 851–854 (2008)

    Article  Google Scholar 

  • Booth, M.: The ai systems of left 4 dead. In: Artificial Intelligence and Interactive Digital Entertainment Conference at Stanford, 2009 (2009)

  • Brady, T.F., Konkle, T., Alvarez, G.A.: A review of visual memory capacity: Beyond individual items and toward structured representations. J. Vis. 11(5), 4 (2011)

    Article  Google Scholar 

  • Csikszentmihalyi, M.: Toward a psychology of optimal experience. In: Flow and the foundations of positive psychology, Springer, pp. 209–226 (2014)

  • Danzi, G., Santana, A.H.P., Furtado, A.W.B., Gouveia, A.R., Leitao, A., Ramalho, G.L.: Online adaptation of computer games agents: A reinforcement learning approach. In: II Workshop de Jogos e Entretenimento Digital, pp. 105–112 (2003)

  • Guzmán, E., Conejo, R.: A model for student knowledge diagnosis through adaptive testing. In: International Conference on Intelligent Tutoring Systems, Springer, pp. 12–21 (2004)

  • Guzmán, E., Conejo, R.: Self-assessment in a feasible, adaptive web-based testing system. IEEE Trans. Educ. 48(4), 688–695 (2005)

    Article  Google Scholar 

  • Guzman, E., Conejo, R., Perez-de-la Cruz, J.L.: Improving student performance using self-assessment tests. IEEE Intell. Syst. 22(4), 46–52 (2007)

    Article  Google Scholar 

  • Holmes, J., Gathercole, S.E., Dunning, D.L.: Adaptive training leads to sustained enhancement of poor working memory in children. Dev. Sci. 12(4), F9–F15 (2009)

    Article  Google Scholar 

  • Jennings-Teats, M., Smith, G., Wardrip-Fruin, N.: Polymorph: dynamic difficulty adjustment through level generation. In: Proceedings of the 2010 Workshop on Procedural Content Generation in Games, ACM, p. 11 (2010)

  • Klingberg, T., Fernell, E., Olesen, P.J., Johnson, M., Gustafsson, P., Dahlström, K., Gillberg, C.G., Forssberg, H., Westerberg, H.: Computerized training of working memory in children with adhd-a randomized, controlled trial. J. Am. Acad. Child & Adolesc. Psychiatr. 44(2), 177–186 (2005)

    Article  Google Scholar 

  • Lan, A.S., Baraniuk, R.G.: A contextual bandits framework for personalized learning action selection. In: EDM, pp. 424–429 (2016)

  • Van der Linden, W.J., Glas, C.A., et al.: Computerized adaptive testing: Theory and practice. Springer (2000)

    Book  Google Scholar 

  • Liu, C., Agrawal, P., Sarkar, N., Chen, S.: Dynamic difficulty adjustment in computer games through real-time anxiety-based affective feedback. Int. J. Human-Comput. Interact. 25(6), 506–529 (2009)

    Article  Google Scholar 

  • Luck, S.J., Vogel, E.K.: The capacity of visual working memory for features and conjunctions. Nature 390(6657), 279 (1997)

    Article  Google Scholar 

  • Okpo, J., Masthoff, J., Dennis, M., Beacham, N.: Conceptualizing a framework for adaptive exercise selection with personality as a major learner characteristic. In: Adjunct publication of the 25th conference on user modeling, adaptation and personalization, pp. 293–298 (2017)

  • Okpo, J., Masthoff, J., Dennis, M., Beacham, N., Ciocarlan, A.: Investigating the impact of personality and cognitive efficiency on the selection of exercises for learners. In: Proceedings of the 25th conference on user modeling, adaptation and personalization, pp. 140–147 (2017)

  • Olesen, P.J., Westerberg, H., Klingberg, T.: Increased prefrontal and parietal activity after training of working memory. Nat. Neurosci. 7(1), 75 (2004)

    Article  Google Scholar 

  • Papoušek, J., Pelánek, R.: Impact of adaptive educational system behaviour on student motivation. In: International Conference on Artificial Intelligence in Education, Springer, pp. 348–357 (2015)

  • Papoušek, J., Stanislav, V., Pelánek, R.: Impact of question difficulty on engagement and learning. In: International Conference on Intelligent Tutoring Systems, Springer, pp. 267–272 (2016)

  • Rapport, M.D., Orban, S.A., Kofler, M.J., Friedman, L.M.: Do programs designed to train working memory, other executive functions, and attention benefit children with adhd? a meta-analytic review of cognitive, academic, and behavioral outcomes. Clin. Psychol. Rev. 33(8), 1237–1252 (2013)

    Article  Google Scholar 

  • Sampayo-Vargas, S., Cope, C.J., He, Z., Byrne, G.J.: The effectiveness of adaptive difficulty adjustments on students’ motivation and learning in an educational computer game. Comput. & Educ. 69, 452–462 (2013)

    Article  Google Scholar 

  • Segal, A., David, Y.B., Williams, J.J., Gal, K., Shalom, Y.: Combining difficulty ranking with multi-armed bandits to sequence educational content. In: International Conference on Artificial Intelligence in Education, Springer, pp. 317–321 (2018)

  • Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Recommender systems handbook, Springer, pp. 257–297 (2011)

  • Shani, G., Shapira, B.: Edurank: A collaborative filtering approach to personalization in e-learning. Educational data mining pp. 68–75 (2014)

  • Swanson, H.L.: Working memory, attention, and mathematical problem solving: A longitudinal study of elementary school children. J. Educ. Psychol. 103(4), 821 (2011)

    Article  Google Scholar 

  • Togelius, J., De Nardi, R., Lucas, S.M.: Towards automatic personalised content creation for racing games. In: 2007 IEEE Symposium on Computational Intelligence and Games, IEEE, pp. 252–259 (2007)

  • Vogel, E.K., Machizawa, M.G.: Neural activity predicts individual differences in visual working memory capacity. Nature 428(6984), 748 (2004)

    Article  Google Scholar 

  • Vygotsky, L.: Interaction between learning and development. Read. Develop. Child. 23(3), 34–41 (1978)

    Google Scholar 

  • Wauters, K., Desmet, P., Van Den Noortgate, W.: Adaptive item-based learning environments based on the item response theory: Possibilities and challenges. J. Comput. Ass. Learn. 26(6), 549–562 (2010)

    Article  Google Scholar 

  • Xu, Y., Chun, M.M.: Visual grouping in human parietal cortex. Proc. Natl. Acad. Sci. 104(47), 18766–18771 (2007)

    Article  Google Scholar 

  • Yao, Y.: Measuring retrieval effectiveness based on user preference of documents. J. Am. Soc. Inf. Sci. 46(2), 133–145 (1995)

    Article  Google Scholar 

  • Zhang, Y., Goh, W.B.: Bootstrapped policy gradient for difficulty adaptation in intelligent tutoring systems. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 711–719 (2019)

  • Zhang, Y., Mańdziuk, J., Quek, C.H., Goh, B.W.: Curvature-based method for determining the number of clusters. Inform. Sci. 415, 414–428 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaqian Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Visual Memory Tasks

Appendix: Visual Memory Tasks

Fig. 15
figure 15

Question bank for the visual memory game

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Goh, WB. Personalized task difficulty adaptation based on reinforcement learning. User Model User-Adap Inter 31, 753–784 (2021). https://doi.org/10.1007/s11257-021-09292-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-021-09292-w

Keywords

Navigation