Abstract
Multi-agent reinforcement learning (MARL) is a widely researched technique for decentralised control in complex large-scale autonomous systems. Such systems often operate in environments that are continuously evolving and where agents’ actions are non-deterministic, so called inherently non-stationary environments. When there are inconsistent results for agents acting on such an environment, learning and adapting is challenging. In this article, we propose P-MARL, an approach that integrates prediction and pattern change detection abilities into MARL and thus minimises the effect of non-stationarity in the environment. The environment is modelled as a time-series, with future estimates provided using prediction techniques. Learning is based on the predicted environment behaviour, with agents employing this knowledge to improve their performance in realtime. We illustrate P-MARL’s performance in a real-world smart grid scenario, where the environment is heavily influenced by non-stationary power demand patterns from residential consumers. We evaluate P-MARL in three different situations, where agents’ action decisions are independent, simultaneous, and sequential. Results show that all methods outperform traditional MARL, with sequential P-MARL achieving best results.
- Cesare Alippi and Manuel Roveri. 2008. Just-in-time adaptive classifiers - Part II: Designing the classifier. IEEE Transactions on Neural Networks 19, 12 (2008), 2053--2064. Google ScholarDigital Library
- George E. P. Box and Gwilym M. Jenkins. 1970. Time Series Analysis, Forecasting and Control. San Francisco, CA: Holden Day.Google Scholar
- Martin Brown and Christopher John Harris. 1994. Neurofuzzy Adaptive Modelling and Control. Prentice Hall. Google ScholarDigital Library
- Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 38, 2 (2008), 156--172. Google ScholarDigital Library
- Samuel P. M. Choi, Dit-Yan Yeung, and Nevin L. Zhang. 2001. Hidden-mode Markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer, 264--287. Google ScholarDigital Library
- Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI/IAAI. 746--752. Google ScholarDigital Library
- Comission for Energy Regulation, Ireland. 2011. Smart Meter Trial Data. Retrieved from http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.Google Scholar
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297. Google ScholarDigital Library
- Kenji Doya, Kazuyuki Samejima, Ken-ichi Katagiri, and Mitsuo Kawato. 2002. Multiple model-based reinforcement learning. Neural Computation 14, 6 (2002), 1347--1369. Google ScholarDigital Library
- Ivana Dusparic and Vinny Cahill. 2010. Multi-policy optimization in self-organizing systems. In Self-Organizing Architectures. Springer, 101--126. Google ScholarDigital Library
- Ivana Dusparic, Colin Harris, Andrei Marinescu, Vinny Cahill, and Siobhán Clarke. 2013. Multi-agent residential demand response based on load forecasting. In Proceedings of the 2013 1st IEEE Conference Technologies for Sustainability (SusTech). IEEE, 90--96.Google ScholarCross Ref
- Mohamed Elidrisi, Nicholas Johnson, Maria Gini, and Jacob Crandall. 2014. Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS’14). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1141--1148. Google ScholarDigital Library
- EPA. 2008. Average Annual Emissions and Fuel Consumption for Gasoline-fueled Passenger Cars and Light Trucks. Technical Report. United States Environmental Protection Agency.Google Scholar
- Lingwen Gan, Ufuk Topcu, and Steven Low. 2011. Optimal decentralized protocol for electric vehicle charging. In Proceedings of the 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC). IEEE, 5798--5804.Google ScholarCross Ref
- George Gross and Francisco D. Galiana. 1987. Short-term load forecasting. Proc. IEEE 75, 12 (1987), 1558--1573.Google ScholarCross Ref
- John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics (1979), 100--108.Google Scholar
- Pablo Hernandez, E. Munoz de Cote, and L. Enrique Sucar. 2013. Learning Against Non-stationary Opponents. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, Saint Paul, MN.Google Scholar
- Junling Hu and Michael P. Wellman. 2003. Nash Q-learning for general-sum Stochastic games. The Journal of Machine Learning Research 4 (2003), 1039--1069. Google ScholarDigital Library
- Markus C. Huebscher and Julie A. McCann. 2008. A survey of autonomic computing: degrees, models, and applications. ACM Comput. Surv. 40, 3, Article 7 (Aug 2008), 28 pages. Google ScholarDigital Library
- Mark Humphrys. 1995. W-learning: Competition Among Selfish Q-learners. Departmental Technical Report. University of Cambridge.Google Scholar
- Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable Stochastic domains. Artificial Intelligence 101, 1 (1998), 99--134. Google ScholarDigital Library
- Micha Kahlen, Wolfgang Ketter, and Jan van Dalen. 2014. Agent-coordinated virtual power plants of electric vehicles. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1547--1548. Google ScholarDigital Library
- Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Springer, 85--103.Google Scholar
- Franziska Klügl, Manuel Fehler, and Rainer Herrler. 2005. About the role of the environment in multi-agent simulations. In Environments for Multi-agent Systems. Springer, 127--149. Google ScholarDigital Library
- Teuvo Kohonen. 1990. The self-organizing map. Proc. IEEE 78, 9 (1990), 1464--1480.Google ScholarCross Ref
- Julien Laumônier and Brahim Chaib-draa. 2005. Multiagent Q-learning: Preliminary study on dominance between the Nash and Stackelberg equilibriums. In Proceedings of AAAI-2005 Workshop on Multiagent Learning.Google Scholar
- Andrei Marinescu, Ivana Dusparic, Colin Harris, Vinny Cahill, and Siobhán Clarke. 2014a. A dynamic forecasting method for small scale residential electrical demand. In IJCNN. 3767--3774.Google Scholar
- Andrei Marinescu, Collin Harris, Ivana Dusparic, Vinny Cahill, and Siobhán Clarke. 2014b. A hybrid approach to very small scale electrical demand forecasting. In Innovative Smart Grid Technologies (ISGT), 2014 IEEE PES. 1--5.Google Scholar
- Andrei Marinescu, Colin Harris, Ivana Dusparic, Siobhán Clarke, and Vinny Cahill. 2013. Residential electrical demand forecasting in very small scale: An evaluation of forecasting methods. In Proceedings of the 2013 2nd International Workshop on Software Engineering Challenges for the Smart Grid (SE4SG). IEEE, 25--32.Google ScholarCross Ref
- Francoise Nemry and Martijn Brons. 2010. Plug-in Hybrid and Battery Electric Vehicles. Market Penetration Scenarios of Electric Drive Vehicles. Technical Report. Institute for Prospective and Technological Studies, Joint Research Centre.Google Scholar
- Sarvapali D. Ramchurn, Perukrishnen Vytelingum, Alex Rogers, and Nick Jennings. 2011. Agent-based control for decentralised demand side management in the smart grid. In Proceedings of the he 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12. Google ScholarDigital Library
- Stefan Rudolph, Sarah Edenhofer, Sven Tomforde, and Jörg Hähner. 2014. Reinforcement learning for coverage optimization through PTZ camera alignment in highly dynamic environments. In Proceedings of the International Conference on Distributed Smart Cameras (ICDSC’14). ACM, New York, NY. Google ScholarDigital Library
- As’ad. Salkham and Vinny Cahill. 2010. Soilse: A decentralized approach to optimization of fluctuating urban traffic using reinforcement learning. In Proceedings of the 2010 13th International IEEE Conference on Intelligent Transportation Systems (ITSC). 531--538.Google Scholar
- Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-agent Reinforcement Learning: A Critical Survey. Technical report, Stanford University.Google Scholar
- Bruno C. Silva, Eduardo W. Basso, Ana L. C. Bazzan, and Paulo M. Engel. 2006. Dealing with non-stationary environments using context detection. In ICML. ACM, 217--224. Google ScholarDigital Library
- Peter Stone and Manuela Veloso. 2000. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8, 3 (2000), 345--383. Google ScholarDigital Library
- Richard S. Sutton. 1990. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th International Conference on Machine Learning. 216--224. Google ScholarDigital Library
- Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning. MIT Press. Google ScholarDigital Library
- Gerald Tesauro, Nicholas K. Jong, Rajarshi Das, and Mohamed N. Bennani. 2006. A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC’06). 65--73. Google ScholarDigital Library
- U.S. Department of Energy at Pacific Northwest National Laboratory. 2014. GridLAB-D. Retrieved from http://www.gridlabd.org/.Google Scholar
- U.S. EPA Fuel Economy Information. 2014. Nissan Leaf. Retrieved from http://www.fueleconomy.gov/feg/Find.do?action=sbs8id=32154.Google Scholar
- Konstantina Valogianni, Wolfgang Ketter, and John Collins. 2014. Learning to schedule electric vehicle charging given individual customer preferences. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1591--1592. Google ScholarDigital Library
- Stijn Vandael, Nelis Boucké, Tom Holvoet, Klaas De Craemer, and Geert Deconinck. 2011. Decentralized coordination of plug-in hybrid vehicles for imbalance reduction in a smart grid. In Proceedings of he 10th International Conference on Autonomous Agents and Multi-agent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 803--810. Google ScholarDigital Library
- Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3--4 (1992), 279--292. Google ScholarDigital Library
- Peter Whittle. 1951. Hypothesis Testing in Time Series Analysis. Vol. 4. Almqvist 8 Wiksells.Google Scholar
- Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 1 (1996), 69--101. Google ScholarDigital Library
- Guoqiang Zhang, B. Eddy Patuwo, and Michael Y. Hu. 1998. Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting 14, 1 (1998), 35--62.Google ScholarCross Ref
Index Terms
- Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments
Recommendations
P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments
AAMAS '15: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent SystemsMulti-Agent Reinforcement Learning (MARL) is a widely-used technique for optimization in decentralised control problems, addressing complex challenges when several agents change actions simultaneously and without collaboration. Such challenges are ...
Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning
Neural Information ProcessingAbstractMulti-agent reinforcement learning methods have shown significant progress, however, they continue to exhibit exploration problems in complex and challenging environments. To address the above issue, current research has introduced several ...
Learning intelligent behavior in a non-stationary and partially observable environment
Individual learning in an environment where more than one agent exist is a challenging task. In this paper, a single learning agent situated in an environment where multiple agents exist is modeled based on reinforcement learning. The environment is non-...
Comments