skip to main content
research-article

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

Published:25 May 2017Publication History
Skip Abstract Section

Abstract

Multi-agent reinforcement learning (MARL) is a widely researched technique for decentralised control in complex large-scale autonomous systems. Such systems often operate in environments that are continuously evolving and where agents’ actions are non-deterministic, so called inherently non-stationary environments. When there are inconsistent results for agents acting on such an environment, learning and adapting is challenging. In this article, we propose P-MARL, an approach that integrates prediction and pattern change detection abilities into MARL and thus minimises the effect of non-stationarity in the environment. The environment is modelled as a time-series, with future estimates provided using prediction techniques. Learning is based on the predicted environment behaviour, with agents employing this knowledge to improve their performance in realtime. We illustrate P-MARL’s performance in a real-world smart grid scenario, where the environment is heavily influenced by non-stationary power demand patterns from residential consumers. We evaluate P-MARL in three different situations, where agents’ action decisions are independent, simultaneous, and sequential. Results show that all methods outperform traditional MARL, with sequential P-MARL achieving best results.

References

  1. Cesare Alippi and Manuel Roveri. 2008. Just-in-time adaptive classifiers - Part II: Designing the classifier. IEEE Transactions on Neural Networks 19, 12 (2008), 2053--2064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. George E. P. Box and Gwilym M. Jenkins. 1970. Time Series Analysis, Forecasting and Control. San Francisco, CA: Holden Day.Google ScholarGoogle Scholar
  3. Martin Brown and Christopher John Harris. 1994. Neurofuzzy Adaptive Modelling and Control. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 38, 2 (2008), 156--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Samuel P. M. Choi, Dit-Yan Yeung, and Nevin L. Zhang. 2001. Hidden-mode Markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer, 264--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI/IAAI. 746--752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Comission for Energy Regulation, Ireland. 2011. Smart Meter Trial Data. Retrieved from http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.Google ScholarGoogle Scholar
  8. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kenji Doya, Kazuyuki Samejima, Ken-ichi Katagiri, and Mitsuo Kawato. 2002. Multiple model-based reinforcement learning. Neural Computation 14, 6 (2002), 1347--1369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ivana Dusparic and Vinny Cahill. 2010. Multi-policy optimization in self-organizing systems. In Self-Organizing Architectures. Springer, 101--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ivana Dusparic, Colin Harris, Andrei Marinescu, Vinny Cahill, and Siobhán Clarke. 2013. Multi-agent residential demand response based on load forecasting. In Proceedings of the 2013 1st IEEE Conference Technologies for Sustainability (SusTech). IEEE, 90--96.Google ScholarGoogle ScholarCross RefCross Ref
  12. Mohamed Elidrisi, Nicholas Johnson, Maria Gini, and Jacob Crandall. 2014. Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS’14). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1141--1148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. EPA. 2008. Average Annual Emissions and Fuel Consumption for Gasoline-fueled Passenger Cars and Light Trucks. Technical Report. United States Environmental Protection Agency.Google ScholarGoogle Scholar
  14. Lingwen Gan, Ufuk Topcu, and Steven Low. 2011. Optimal decentralized protocol for electric vehicle charging. In Proceedings of the 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC). IEEE, 5798--5804.Google ScholarGoogle ScholarCross RefCross Ref
  15. George Gross and Francisco D. Galiana. 1987. Short-term load forecasting. Proc. IEEE 75, 12 (1987), 1558--1573.Google ScholarGoogle ScholarCross RefCross Ref
  16. John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics (1979), 100--108.Google ScholarGoogle Scholar
  17. Pablo Hernandez, E. Munoz de Cote, and L. Enrique Sucar. 2013. Learning Against Non-stationary Opponents. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, Saint Paul, MN.Google ScholarGoogle Scholar
  18. Junling Hu and Michael P. Wellman. 2003. Nash Q-learning for general-sum Stochastic games. The Journal of Machine Learning Research 4 (2003), 1039--1069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Markus C. Huebscher and Julie A. McCann. 2008. A survey of autonomic computing: degrees, models, and applications. ACM Comput. Surv. 40, 3, Article 7 (Aug 2008), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Humphrys. 1995. W-learning: Competition Among Selfish Q-learners. Departmental Technical Report. University of Cambridge.Google ScholarGoogle Scholar
  21. Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable Stochastic domains. Artificial Intelligence 101, 1 (1998), 99--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Micha Kahlen, Wolfgang Ketter, and Jan van Dalen. 2014. Agent-coordinated virtual power plants of electric vehicles. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1547--1548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Springer, 85--103.Google ScholarGoogle Scholar
  24. Franziska Klügl, Manuel Fehler, and Rainer Herrler. 2005. About the role of the environment in multi-agent simulations. In Environments for Multi-agent Systems. Springer, 127--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Teuvo Kohonen. 1990. The self-organizing map. Proc. IEEE 78, 9 (1990), 1464--1480.Google ScholarGoogle ScholarCross RefCross Ref
  26. Julien Laumônier and Brahim Chaib-draa. 2005. Multiagent Q-learning: Preliminary study on dominance between the Nash and Stackelberg equilibriums. In Proceedings of AAAI-2005 Workshop on Multiagent Learning.Google ScholarGoogle Scholar
  27. Andrei Marinescu, Ivana Dusparic, Colin Harris, Vinny Cahill, and Siobhán Clarke. 2014a. A dynamic forecasting method for small scale residential electrical demand. In IJCNN. 3767--3774.Google ScholarGoogle Scholar
  28. Andrei Marinescu, Collin Harris, Ivana Dusparic, Vinny Cahill, and Siobhán Clarke. 2014b. A hybrid approach to very small scale electrical demand forecasting. In Innovative Smart Grid Technologies (ISGT), 2014 IEEE PES. 1--5.Google ScholarGoogle Scholar
  29. Andrei Marinescu, Colin Harris, Ivana Dusparic, Siobhán Clarke, and Vinny Cahill. 2013. Residential electrical demand forecasting in very small scale: An evaluation of forecasting methods. In Proceedings of the 2013 2nd International Workshop on Software Engineering Challenges for the Smart Grid (SE4SG). IEEE, 25--32.Google ScholarGoogle ScholarCross RefCross Ref
  30. Francoise Nemry and Martijn Brons. 2010. Plug-in Hybrid and Battery Electric Vehicles. Market Penetration Scenarios of Electric Drive Vehicles. Technical Report. Institute for Prospective and Technological Studies, Joint Research Centre.Google ScholarGoogle Scholar
  31. Sarvapali D. Ramchurn, Perukrishnen Vytelingum, Alex Rogers, and Nick Jennings. 2011. Agent-based control for decentralised demand side management in the smart grid. In Proceedings of the he 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Stefan Rudolph, Sarah Edenhofer, Sven Tomforde, and Jörg Hähner. 2014. Reinforcement learning for coverage optimization through PTZ camera alignment in highly dynamic environments. In Proceedings of the International Conference on Distributed Smart Cameras (ICDSC’14). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. As’ad. Salkham and Vinny Cahill. 2010. Soilse: A decentralized approach to optimization of fluctuating urban traffic using reinforcement learning. In Proceedings of the 2010 13th International IEEE Conference on Intelligent Transportation Systems (ITSC). 531--538.Google ScholarGoogle Scholar
  34. Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-agent Reinforcement Learning: A Critical Survey. Technical report, Stanford University.Google ScholarGoogle Scholar
  35. Bruno C. Silva, Eduardo W. Basso, Ana L. C. Bazzan, and Paulo M. Engel. 2006. Dealing with non-stationary environments using context detection. In ICML. ACM, 217--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Peter Stone and Manuela Veloso. 2000. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8, 3 (2000), 345--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Richard S. Sutton. 1990. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th International Conference on Machine Learning. 216--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Gerald Tesauro, Nicholas K. Jong, Rajarshi Das, and Mohamed N. Bennani. 2006. A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC’06). 65--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. U.S. Department of Energy at Pacific Northwest National Laboratory. 2014. GridLAB-D. Retrieved from http://www.gridlabd.org/.Google ScholarGoogle Scholar
  41. U.S. EPA Fuel Economy Information. 2014. Nissan Leaf. Retrieved from http://www.fueleconomy.gov/feg/Find.do?action=sbs8id=32154.Google ScholarGoogle Scholar
  42. Konstantina Valogianni, Wolfgang Ketter, and John Collins. 2014. Learning to schedule electric vehicle charging given individual customer preferences. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1591--1592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Stijn Vandael, Nelis Boucké, Tom Holvoet, Klaas De Craemer, and Geert Deconinck. 2011. Decentralized coordination of plug-in hybrid vehicles for imbalance reduction in a smart grid. In Proceedings of he 10th International Conference on Autonomous Agents and Multi-agent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 803--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3--4 (1992), 279--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Peter Whittle. 1951. Hypothesis Testing in Time Series Analysis. Vol. 4. Almqvist 8 Wiksells.Google ScholarGoogle Scholar
  46. Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 1 (1996), 69--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Guoqiang Zhang, B. Eddy Patuwo, and Michael Y. Hu. 1998. Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting 14, 1 (1998), 35--62.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Autonomous and Adaptive Systems
            ACM Transactions on Autonomous and Adaptive Systems  Volume 12, Issue 2
            June 2017
            162 pages
            ISSN:1556-4665
            EISSN:1556-4703
            DOI:10.1145/3099619
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 May 2017
            • Revised: 1 March 2017
            • Accepted: 1 March 2017
            • Received: 1 July 2015
            Published in taas Volume 12, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader