Skip to main content
Log in

Evaluating skills in hierarchical reinforcement learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Despite the benefits mentioned in previous works of automatically acquiring skills for using them in hierarchical reinforcement learning algorithms such as solving the curse of dimensionality, improving exploration, and speeding up value propagation, they have not paid much attention to evaluating the effect of each skill on these factors. In this paper, we show that depending on the given task, a skill may be useful for learning it or not. In addition, the focus of the related work of automatically acquiring skills is on detecting subgoals, i.e., the skill termination condition, but there is not a precise method for extracting the initiation set of skills. In this paper, we propose not only two methods for evaluating skills but also two other methods for pruning the initiation set of them. Experimental results show significant improvements in learning different test domains after evaluating and pruning skills.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Dulac-Arnold G, Mankowitz D, Hester T (2019) Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901, 29 April 2019

  2. Moerman W (2009) Hierarchical reinforcement learning : assignment of behaviours to subpolicies by self-organization. Ph.D. thesis, Utrecht University

  3. Pfau J (2008) Plans as a means for guiding reinforcement learner. Ph.D. thesis, The University of Melbourn

  4. Nguyen TT, Nguyen ND, Nahavandi S (2018) Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications. arXiv preprint arXiv:1812.11794. 31 Dec 2018

  5. McGovern A, Sutton RS (1998) Macro-actions in reinforcement learning: an empirical analysis. University of Massachusetts, Department of Computer Science, Tech. Rep 98–70

  6. Jong NK, Hester T, Stone P (2008) The utility of temporal abstraction in reinforcement learning. In: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, vol 1, pp 299–306

  7. Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3):293–321

    MathSciNet  Google Scholar 

  8. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054

    Article  Google Scholar 

  9. Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1):181–211

    Article  MathSciNet  Google Scholar 

  10. Dietterich TG (2000) An overview of MAXQ hierarchical reinforcement learning. In: International symposium on abstraction, reformulation, and approximation. Springer, Berlin, Heidelberg, pp 26–44

    Chapter  Google Scholar 

  11. Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recognit Lett 87:104–116

    Article  Google Scholar 

  12. Xiong C, Tianmin S, Socher R (2019) Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. United States patent application

  13. Bacon P, Harb J, Precup D (2017) The option-critic architecture. In: Thirty-first AAAI conference on artificial intelligence, pp 1726–1734

  14. Machado M, Bellemare M, Bowling M (2017) A Laplacian framework for option discovery in reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2295–2304

  15. Dann M, Zambetta F (2017) Integrating skills and simulation to solve complex navigation tasks in Infinite Mario. IEEE Trans Games 10:101–106

    Article  Google Scholar 

  16. Fox R, Krishnan S, Stoica I, Goldberg K (2017) Multi-level discovery of deep options. arXiv preprint arXiv:1703.08294

  17. Houthooft R, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P (2016) Vime: variational information maximizing exploration. In: Advances in neural information processing systems, pp 1109–1117

  18. Demir A, Çilden E, Polat F (2016) Local roots: a tree-based subgoal discovery method to accelerate reinforcement learning. In: Joint European conference on machine learning and knowledge discovery in databases, pp 361–376

  19. Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683

  20. Riemer M, Liu M, Tesauro G (2018) Learning abstract options. In: Advances in neural information processing systems, pp 10424–10434

  21. Kaelbling L (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Article  Google Scholar 

  22. McGovern A, Barto AG (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Machine learning-international workshop then conference, pp 361–368

  23. Menache I, Mannor S, Shimkin N (2002) Q-cut-dynamic discovery of sub-goals in reinforcement learning. In: European conference on machine learning: ECML 2002, pp 295–306

  24. Simşek O (2008) Behavioral building blocks for autonomous agents: description, identification, and learning. Ph.D. Thesis, University of Massachusetts Amherst

  25. Merrick K (2007) Modelling motivation for experience-based attention focus in reinforcement learning. Ph.D. Thesis, School of Information Technologies, University of Sydney

  26. Mehta N, Ray S, Tadepalli P, Dietterich T (2008) Automatic discovery and transfer of MAXQ hierarchies. In: Proceedings of the 25th international conference on machine learning, pp 648–655

  27. Zang P, Zhou P, Minnen D, Isbell C (2009) Discovering options from example trajectories. In: Proceedings of the 26th annual international conference on machine learning, pp 1217–1224

  28. Mannor S, Menache I, Hoze A, Klein U (2004) Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the twenty-first international conference on machine learning, p 71

  29. Murata J (2008) Controlled use of subgoals in reinforcement learning. In: Robotics, automation and control, book, pp 167–182

  30. Davoodabadi Farahani M, Mozayani N (2019) Automatic construction and evaluation of macro-actions in reinforcement learning. Appl Soft Comput 82:105574

    Article  Google Scholar 

  31. Metzen JH (2014) Learning the structure of continuous Markov decision processes. Ph.D. thesis, Universität Bremen

  32. Davoodabadi Farahani M, Mozayani N (2020) A new method for acquiring reusable skills in intrinsically motivated reinforcement learning. J Intell Manuf (submitted)

  33. Barto AG, Singh S, Chentanez N (2004) Intrinsically motivated learning of hierarchical collections of skills. In: Proceedings of the 3rd international conference on development and learning (ICDL 2004), Salk Institute, San Diego

  34. Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8188 LNAI, no PART 1, pp 81–96

  35. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  36. Davoodabadi Farahani M, Mozayani N (2018) Proposing a new method for acquiring skills in reinforcement learning with the help of graph clustering. Iran J Electr Comput Eng 2(16):131–141

    Google Scholar 

  37. Sutton RS, Precup D, Singh S (1998) Intra-option learning about temporally abstract actions. In: Proceedings of the fifteenth international conference on machine learning, pp 556–564

  38. Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. Mach Learn Knowl Discov Databases 8188:81–96

    Google Scholar 

  39. Henderson P, Chang WD, Shkurti F, Hansen J, Meger D, Dudek G (2017) Benchmark environments for multitask learning in continuous domains. arXiv preprint arXiv:1708.04352

  40. François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends®. Mach Learn 11(3–4):219–354

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasser Mozayani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Davoodabadi Farahani, M., Mozayani, N. Evaluating skills in hierarchical reinforcement learning. Int. J. Mach. Learn. & Cyber. 11, 2407–2420 (2020). https://doi.org/10.1007/s13042-020-01141-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01141-3

Keywords

Navigation