Skip to main content
Log in

Acquiring reusable skills in intrinsically motivated reinforcement learning

  • Published:
Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Abstract

This paper proposes a novel incremental model for acquiring skills and using them in Intrinsically Motivated Reinforcement Learning (IMRL). In this model, the learning process is divided into two phases. In the first phase, the agent explores the environment and acquires task-independent skills by using different intrinsic motivation mechanisms. We present two intrinsic motivation factors for acquiring skills by detecting states that can lead to other states (being a cause) and by detecting states that help the agent to transition to a different region (discounted relative novelty). In the second phase, the agent evaluates the acquired skills to find suitable ones for accomplishing a specific task. Despite the importance of assessing task-independent skills to perform a task, the idea of evaluating skills and pruning them has not been considered in IMRL literature. In this article, two methods are presented for evaluating previously learned skills based on the value function of the assigned task. Using such a two-phase learning model and the skill evaluation capability helps the agent to acquire task-independent skills that can be transferred to other similar tasks. Experimental results in four domains show that the proposed method significantly increases learning speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. The results of the OGAHC algorithm are plotted for \( \rho = 1 \) which, had the best results among other \( \rho \) values in [55].

References

  • Aissani, N., Bekrar, A., Trentesaux, D., & Beldjilali, B. (2012). Dynamic scheduling for multi-site companies: A decisional approach based on reinforcement multi-agent learning. Journal of Intelligent Manufacturing, 23, 2513–2529.

    Article  Google Scholar 

  • Aubret, A., Matignon, L., & Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. Preprint arXiv:1908.06976.

  • Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379.

    Article  Google Scholar 

  • Barto, A. G., & Simsek, O. (2005). Intrinsic motivation for reinforcement learning systems. In Proceedings of the thirteenth yale workshop on adaptive and learning systems.

  • Barto, A. G., Singh, S., & Chentanez, N. (2004). Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of the 3rd international conference on development and learning (ICDL 2004), Salk Institute, San Diego.

  • Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems (pp. 1471–1479).

  • Berlyne, D. E. (1960). Conflict, arousal, and curiosity. New York: McGraw-Hill.

    Book  Google Scholar 

  • Bonarini, A., Lazaric, A., Restelli, M., & Vitali, P. (2006). Self-development framework for reinforcement learning agents. In Proceedings of the 5th international conference on development and learning ICDL (Vol. 178, pp. 355–362).

  • Brandes, U. (2001). A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology, 25(2), 163–177.

    Article  Google Scholar 

  • Chen, C., Xia, B., Zhou, B., & Lifeng, X. (2015). A reinforcement learning based approach for a multiple-load carrier scheduling problem. Journal of Intelligent Manufacturing, 26, 1233–1245.

    Article  Google Scholar 

  • Davoodabadi, M., & Beigy, H. (2011). A new method for discovering subgoals and constructing options in reinforcement learning. In proceedings of 5th Indian international conference on artificial intelligence (IICAI-11) (pp. 441–450).

  • Davoodabadi Farahani, M., & Mozayani, N. (2019). Automatic construction and evaluation of macro-actions in reinforcement learning. Applied Soft Computing, 82, 105574.

    Article  Google Scholar 

  • Davoodabadi Farahani, M., & Mozayani, N. (2020). Evaluating skills in hierarchical reinforcement learning. International Journal of Machine Learning and Cybernetics. https://doi.org/10.1007/s13042-020-01141-3.

    Article  Google Scholar 

  • Dhakan, P., Merrick, K., Rañó, I., & Siddique, N. (2018). Intrinsic rewards for maintenance, approach, avoidance, and achievement goal types. Frontiers in Neurorobotics, 12(October), 1–16.

    Google Scholar 

  • Florensa, C., Held, D., Geng, X., & Abbeel, P. (2018). Automatic goal generation for reinforcement learning agents. In International conference on machine learning (pp. 1514–1523).

  • Forestier, S., & Oudeyer, P. Y. (2016). Overlapping waves in tool use development: a curiosity-driven computational model. In The sixth joint IEEE international conference on developmental learning and epigenetic robotics (pp. 238–245).

  • Groos, K. (1901). The play of man: Chapter 8: The theory of play. D. Appleton.

  • Haber, N., Mrowca, D., Fei-Fei, L., & Yamins, D. (2018). Emergence of structured behaviors from curiosity-based intrinsic motivation. Preprint arXiv:1802.07461.

  • Hester, T., & Stone, P. (2012). Intrinsically motivated model learning for a developing curious agent. In AAMAS adaptive learning agents (ALA) workshop.

  • Hester, T., & Stone, P. (2017). Intrinsically motivated model learning for developing curious robots. Artificial Intelligence, 247, 170–186.

    Article  Google Scholar 

  • Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., & Abbeel, P. (2016). Vime: Variational information maximizing exploration. Advances in Neural Information Processing Systems (pp. 1109–1117).

  • Jensen, P., Morini, M., Karsai, M., Venturini, T., Vespignani, A., Jacomy, M., et al. (2015). Detecting global bridges in networks. Journal of Complex Networks, 4(3), 319–329.

    Article  Google Scholar 

  • Jong, N. K., Hester, T., & Stone, P. (2008). The utility of temporal abstraction in reinforcement learning. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 1 (pp. 299–306).

  • Konidaris, G., Kuindersma, S., Barto, A., & Grupen, R. (2010). Constructing skill trees for reinforcement learning agents from demonstration trajectories. Advances in Neural Information Processing Systems (NIPS).

  • Lee, M.-J., Choi, S., & Chung, C.-W. (2016). Efficient algorithms for updating betweenness centrality in fully dynamic graphs. Information Sciences, 326, 278–296.

    Article  Google Scholar 

  • Li, R. (2019). Reinforcement learning applications. arXiv:1908.06973.

  • Lin, L. J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3), 293–321.

    Google Scholar 

  • Mann, T., & Mannor, S. (2014). Scaling up approximate value iteration with options: Better policies with fewer iterations. In Proceedings of the 31st international conference on machine learning.

  • Mannor, S., Menache, I., Hoze, A., & Klein, U. (2004). Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the twenty-first international conference on Machine learning (p. 71).

  • McGovern, A., & Sutton, R. S. (1998). Macro-actions in reinforcement learning: An empirical analysis. University of Massachusetts, Department of Computer Science, Tech. Rep (pp. 98–70).

  • Merrick, K. E. (2012). Intrinsic motivation and introspection in reinforcement learning. IEEE Transactions on Autonomous Mental Development, 4(4), 315–329.

    Article  Google Scholar 

  • Metzen, J. H. (2013). Learning graph-based representations for continuous reinforcement learning domains. In Lecture Notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (Vol. 8188 LNAI, No. PART 1, pp. 81–96).

  • Metzen, J. H. (2014). Learning the structure of continuous markov decision processes. PhD thesis, Universität Bremen.

  • Metzen, J. H., & Kirchner, F. (2013). Incremental learning of skill collections based on intrinsic motivation. Frontiers in Neurorobotics, 7(July), 1–12.

    Google Scholar 

  • Mirolli, M., & Baldassarre, G. (Eds.). (2013a). Intrinsically motivated learning in natural and artificial systems. Heidelberg: Springer.

    Google Scholar 

  • Mirolli, M., & Baldassarre, G. (2013b). Functions and mechanisms of intrinsic motivations. In G. Baldassarre & M. Mirolli (Eds.), Intrinsically motivated learning in natural and artificial systems (pp. 49–72). Berlin: Springer.

    Chapter  Google Scholar 

  • Moerman, W. (2009). Hierarchical reinforcement learning : Assignment of behaviours to subpolicies by self-organization. PhD thesis, Utrecht University.

  • Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. Advances in neural Information Processing Systems (pp. 2125–2133).

  • Murata, J. (2008). Controlled use of subgoals in reinforcement learning. In Robotics, automation and control, book, no. October (pp. 167–182).

  • Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113.

    Article  Google Scholar 

  • Oudeyer, P.-Y., & Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics, 1, 6.

    Article  Google Scholar 

  • Oudeyer, P. Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2), 265–286.

    Article  Google Scholar 

  • Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In IEEE computer society conference on computer vision and pattern recognition workshops (pp. 16–17).

  • Piaget, J. (1962). Play, dreams and imitation (Vol. 24). New York: Norton.

  • Santucci, V., Baldassarre, G., & Mirolli, M. (2016). GRAIL: A goal-discovering robotic architecture for intrinsically-motivated learning. IEEE Transactions on Cognitive and Developmental Systems, 8(3), 214–231.

    Article  Google Scholar 

  • Schembri, M., Mirolli, M., & Baldassarre, G. (2007). Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot. In 2007 IEEE 6th International conference on development and learning, ICDL (pp. 282–287).

  • Siddique, N., Dhakan, P., Rano, I., & Merrick, K. (2017). A review of the relationship between novelty, intrinsic motivation and reinforcement learning. Journal of Behavioral Robotics, 8(1), 58–69.

    Article  Google Scholar 

  • Simşek, O. (2008). Behavioral building blocks for autonomous agents: Description, identification, and learning. PhD Thesis, University of Massachusetts Amherst.

  • Stout, A., & Barto, A. G. (2010). Competence progress intrinsic motivation. In Proceedings of the ninth IEEE international on development and learning.

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5), 1054.

    Article  Google Scholar 

  • Sutton, R. S., Precup, D., & Singh, S. (1998). Intra-option learning about temporally abstract actions. In Proceedings of the fifteenth international conference on machine learning (pp. 556–564).

  • Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1), 181–211.

    Article  Google Scholar 

  • Thrun, S. (1995). Exploration in active learning. Handbook of brain science and neural.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasser Mozayani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Davoodabadi Farahani, M., Mozayani, N. Acquiring reusable skills in intrinsically motivated reinforcement learning. J Intell Manuf 32, 2147–2168 (2021). https://doi.org/10.1007/s10845-020-01629-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10845-020-01629-3

Keywords

Navigation