Abstract
Despite the benefits mentioned in previous works of automatically acquiring skills for using them in hierarchical reinforcement learning algorithms such as solving the curse of dimensionality, improving exploration, and speeding up value propagation, they have not paid much attention to evaluating the effect of each skill on these factors. In this paper, we show that depending on the given task, a skill may be useful for learning it or not. In addition, the focus of the related work of automatically acquiring skills is on detecting subgoals, i.e., the skill termination condition, but there is not a precise method for extracting the initiation set of skills. In this paper, we propose not only two methods for evaluating skills but also two other methods for pruning the initiation set of them. Experimental results show significant improvements in learning different test domains after evaluating and pruning skills.
Similar content being viewed by others
References
Dulac-Arnold G, Mankowitz D, Hester T (2019) Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901, 29 April 2019
Moerman W (2009) Hierarchical reinforcement learning : assignment of behaviours to subpolicies by self-organization. Ph.D. thesis, Utrecht University
Pfau J (2008) Plans as a means for guiding reinforcement learner. Ph.D. thesis, The University of Melbourn
Nguyen TT, Nguyen ND, Nahavandi S (2018) Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications. arXiv preprint arXiv:1812.11794. 31 Dec 2018
McGovern A, Sutton RS (1998) Macro-actions in reinforcement learning: an empirical analysis. University of Massachusetts, Department of Computer Science, Tech. Rep 98–70
Jong NK, Hester T, Stone P (2008) The utility of temporal abstraction in reinforcement learning. In: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, vol 1, pp 299–306
Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3):293–321
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054
Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1):181–211
Dietterich TG (2000) An overview of MAXQ hierarchical reinforcement learning. In: International symposium on abstraction, reformulation, and approximation. Springer, Berlin, Heidelberg, pp 26–44
Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recognit Lett 87:104–116
Xiong C, Tianmin S, Socher R (2019) Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. United States patent application
Bacon P, Harb J, Precup D (2017) The option-critic architecture. In: Thirty-first AAAI conference on artificial intelligence, pp 1726–1734
Machado M, Bellemare M, Bowling M (2017) A Laplacian framework for option discovery in reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2295–2304
Dann M, Zambetta F (2017) Integrating skills and simulation to solve complex navigation tasks in Infinite Mario. IEEE Trans Games 10:101–106
Fox R, Krishnan S, Stoica I, Goldberg K (2017) Multi-level discovery of deep options. arXiv preprint arXiv:1703.08294
Houthooft R, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P (2016) Vime: variational information maximizing exploration. In: Advances in neural information processing systems, pp 1109–1117
Demir A, Çilden E, Polat F (2016) Local roots: a tree-based subgoal discovery method to accelerate reinforcement learning. In: Joint European conference on machine learning and knowledge discovery in databases, pp 361–376
Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683
Riemer M, Liu M, Tesauro G (2018) Learning abstract options. In: Advances in neural information processing systems, pp 10424–10434
Kaelbling L (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
McGovern A, Barto AG (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Machine learning-international workshop then conference, pp 361–368
Menache I, Mannor S, Shimkin N (2002) Q-cut-dynamic discovery of sub-goals in reinforcement learning. In: European conference on machine learning: ECML 2002, pp 295–306
Simşek O (2008) Behavioral building blocks for autonomous agents: description, identification, and learning. Ph.D. Thesis, University of Massachusetts Amherst
Merrick K (2007) Modelling motivation for experience-based attention focus in reinforcement learning. Ph.D. Thesis, School of Information Technologies, University of Sydney
Mehta N, Ray S, Tadepalli P, Dietterich T (2008) Automatic discovery and transfer of MAXQ hierarchies. In: Proceedings of the 25th international conference on machine learning, pp 648–655
Zang P, Zhou P, Minnen D, Isbell C (2009) Discovering options from example trajectories. In: Proceedings of the 26th annual international conference on machine learning, pp 1217–1224
Mannor S, Menache I, Hoze A, Klein U (2004) Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the twenty-first international conference on machine learning, p 71
Murata J (2008) Controlled use of subgoals in reinforcement learning. In: Robotics, automation and control, book, pp 167–182
Davoodabadi Farahani M, Mozayani N (2019) Automatic construction and evaluation of macro-actions in reinforcement learning. Appl Soft Comput 82:105574
Metzen JH (2014) Learning the structure of continuous Markov decision processes. Ph.D. thesis, Universität Bremen
Davoodabadi Farahani M, Mozayani N (2020) A new method for acquiring reusable skills in intrinsically motivated reinforcement learning. J Intell Manuf (submitted)
Barto AG, Singh S, Chentanez N (2004) Intrinsically motivated learning of hierarchical collections of skills. In: Proceedings of the 3rd international conference on development and learning (ICDL 2004), Salk Institute, San Diego
Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8188 LNAI, no PART 1, pp 81–96
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Davoodabadi Farahani M, Mozayani N (2018) Proposing a new method for acquiring skills in reinforcement learning with the help of graph clustering. Iran J Electr Comput Eng 2(16):131–141
Sutton RS, Precup D, Singh S (1998) Intra-option learning about temporally abstract actions. In: Proceedings of the fifteenth international conference on machine learning, pp 556–564
Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. Mach Learn Knowl Discov Databases 8188:81–96
Henderson P, Chang WD, Shkurti F, Hansen J, Meger D, Dudek G (2017) Benchmark environments for multitask learning in continuous domains. arXiv preprint arXiv:1708.04352
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends®. Mach Learn 11(3–4):219–354
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Davoodabadi Farahani, M., Mozayani, N. Evaluating skills in hierarchical reinforcement learning. Int. J. Mach. Learn. & Cyber. 11, 2407–2420 (2020). https://doi.org/10.1007/s13042-020-01141-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01141-3