research-article

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

Authors:
Andrei Marinescu

Trinity College Dublin, Dublin, Ireland

Trinity College Dublin, Dublin, Ireland
View Profile

,
Ivana Dusparic

Trinity College Dublin, Dublin, Ireland

Trinity College Dublin, Dublin, Ireland
View Profile

,
Siobhán Clarke

Trinity College Dublin, Dublin, Ireland

Trinity College Dublin, Dublin, Ireland
View Profile

ACM Transactions on Autonomous and Adaptive Systems Volume 12 Issue 2Article No.: 9pp 1–23https://doi.org/10.1145/3070861

Published:25 May 2017Publication History

ACM Transactions on Autonomous and Adaptive Systems

Abstract

Multi-agent reinforcement learning (MARL) is a widely researched technique for decentralised control in complex large-scale autonomous systems. Such systems often operate in environments that are continuously evolving and where agents’ actions are non-deterministic, so called inherently non-stationary environments. When there are inconsistent results for agents acting on such an environment, learning and adapting is challenging. In this article, we propose P-MARL, an approach that integrates prediction and pattern change detection abilities into MARL and thus minimises the effect of non-stationarity in the environment. The environment is modelled as a time-series, with future estimates provided using prediction techniques. Learning is based on the predicted environment behaviour, with agents employing this knowledge to improve their performance in realtime. We illustrate P-MARL’s performance in a real-world smart grid scenario, where the environment is heavily influenced by non-stationary power demand patterns from residential consumers. We evaluate P-MARL in three different situations, where agents’ action decisions are independent, simultaneous, and sequential. Results show that all methods outperform traditional MARL, with sequential P-MARL achieving best results.

References

Cesare Alippi and Manuel Roveri. 2008. Just-in-time adaptive classifiers - Part II: Designing the classifier. IEEE Transactions on Neural Networks 19, 12 (2008), 2053--2064. Google ScholarDigital Library
George E. P. Box and Gwilym M. Jenkins. 1970. Time Series Analysis, Forecasting and Control. San Francisco, CA: Holden Day.Google Scholar
Martin Brown and Christopher John Harris. 1994. Neurofuzzy Adaptive Modelling and Control. Prentice Hall. Google ScholarDigital Library
Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 38, 2 (2008), 156--172. Google ScholarDigital Library
Samuel P. M. Choi, Dit-Yan Yeung, and Nevin L. Zhang. 2001. Hidden-mode Markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer, 264--287. Google ScholarDigital Library
Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI/IAAI. 746--752. Google ScholarDigital Library
Comission for Energy Regulation, Ireland. 2011. Smart Meter Trial Data. Retrieved from http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.Google Scholar
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297. Google ScholarDigital Library
Kenji Doya, Kazuyuki Samejima, Ken-ichi Katagiri, and Mitsuo Kawato. 2002. Multiple model-based reinforcement learning. Neural Computation 14, 6 (2002), 1347--1369. Google ScholarDigital Library
Ivana Dusparic and Vinny Cahill. 2010. Multi-policy optimization in self-organizing systems. In Self-Organizing Architectures. Springer, 101--126. Google ScholarDigital Library
Ivana Dusparic, Colin Harris, Andrei Marinescu, Vinny Cahill, and Siobhán Clarke. 2013. Multi-agent residential demand response based on load forecasting. In Proceedings of the 2013 1st IEEE Conference Technologies for Sustainability (SusTech). IEEE, 90--96.Google ScholarCross Ref
Mohamed Elidrisi, Nicholas Johnson, Maria Gini, and Jacob Crandall. 2014. Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS’14). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1141--1148. Google ScholarDigital Library
EPA. 2008. Average Annual Emissions and Fuel Consumption for Gasoline-fueled Passenger Cars and Light Trucks. Technical Report. United States Environmental Protection Agency.Google Scholar
Lingwen Gan, Ufuk Topcu, and Steven Low. 2011. Optimal decentralized protocol for electric vehicle charging. In Proceedings of the 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC). IEEE, 5798--5804.Google ScholarCross Ref
George Gross and Francisco D. Galiana. 1987. Short-term load forecasting. Proc. IEEE 75, 12 (1987), 1558--1573.Google ScholarCross Ref
John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics (1979), 100--108.Google Scholar
Pablo Hernandez, E. Munoz de Cote, and L. Enrique Sucar. 2013. Learning Against Non-stationary Opponents. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, Saint Paul, MN.Google Scholar
Junling Hu and Michael P. Wellman. 2003. Nash Q-learning for general-sum Stochastic games. The Journal of Machine Learning Research 4 (2003), 1039--1069. Google ScholarDigital Library
Markus C. Huebscher and Julie A. McCann. 2008. A survey of autonomic computing: degrees, models, and applications. ACM Comput. Surv. 40, 3, Article 7 (Aug 2008), 28 pages. Google ScholarDigital Library
Mark Humphrys. 1995. W-learning: Competition Among Selfish Q-learners. Departmental Technical Report. University of Cambridge.Google Scholar
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable Stochastic domains. Artificial Intelligence 101, 1 (1998), 99--134. Google ScholarDigital Library
Micha Kahlen, Wolfgang Ketter, and Jan van Dalen. 2014. Agent-coordinated virtual power plants of electric vehicles. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1547--1548. Google ScholarDigital Library
Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Springer, 85--103.Google Scholar
Franziska Klügl, Manuel Fehler, and Rainer Herrler. 2005. About the role of the environment in multi-agent simulations. In Environments for Multi-agent Systems. Springer, 127--149. Google ScholarDigital Library
Teuvo Kohonen. 1990. The self-organizing map. Proc. IEEE 78, 9 (1990), 1464--1480.Google ScholarCross Ref
Julien Laumônier and Brahim Chaib-draa. 2005. Multiagent Q-learning: Preliminary study on dominance between the Nash and Stackelberg equilibriums. In Proceedings of AAAI-2005 Workshop on Multiagent Learning.Google Scholar
Andrei Marinescu, Ivana Dusparic, Colin Harris, Vinny Cahill, and Siobhán Clarke. 2014a. A dynamic forecasting method for small scale residential electrical demand. In IJCNN. 3767--3774.Google Scholar
Andrei Marinescu, Collin Harris, Ivana Dusparic, Vinny Cahill, and Siobhán Clarke. 2014b. A hybrid approach to very small scale electrical demand forecasting. In Innovative Smart Grid Technologies (ISGT), 2014 IEEE PES. 1--5.Google Scholar
Andrei Marinescu, Colin Harris, Ivana Dusparic, Siobhán Clarke, and Vinny Cahill. 2013. Residential electrical demand forecasting in very small scale: An evaluation of forecasting methods. In Proceedings of the 2013 2nd International Workshop on Software Engineering Challenges for the Smart Grid (SE4SG). IEEE, 25--32.Google ScholarCross Ref
Francoise Nemry and Martijn Brons. 2010. Plug-in Hybrid and Battery Electric Vehicles. Market Penetration Scenarios of Electric Drive Vehicles. Technical Report. Institute for Prospective and Technological Studies, Joint Research Centre.Google Scholar
Sarvapali D. Ramchurn, Perukrishnen Vytelingum, Alex Rogers, and Nick Jennings. 2011. Agent-based control for decentralised demand side management in the smart grid. In Proceedings of the he 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12. Google ScholarDigital Library
Stefan Rudolph, Sarah Edenhofer, Sven Tomforde, and Jörg Hähner. 2014. Reinforcement learning for coverage optimization through PTZ camera alignment in highly dynamic environments. In Proceedings of the International Conference on Distributed Smart Cameras (ICDSC’14). ACM, New York, NY. Google ScholarDigital Library
As’ad. Salkham and Vinny Cahill. 2010. Soilse: A decentralized approach to optimization of fluctuating urban traffic using reinforcement learning. In Proceedings of the 2010 13th International IEEE Conference on Intelligent Transportation Systems (ITSC). 531--538.Google Scholar
Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-agent Reinforcement Learning: A Critical Survey. Technical report, Stanford University.Google Scholar
Bruno C. Silva, Eduardo W. Basso, Ana L. C. Bazzan, and Paulo M. Engel. 2006. Dealing with non-stationary environments using context detection. In ICML. ACM, 217--224. Google ScholarDigital Library
Peter Stone and Manuela Veloso. 2000. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8, 3 (2000), 345--383. Google ScholarDigital Library
Richard S. Sutton. 1990. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th International Conference on Machine Learning. 216--224. Google ScholarDigital Library
Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning. MIT Press. Google ScholarDigital Library
Gerald Tesauro, Nicholas K. Jong, Rajarshi Das, and Mohamed N. Bennani. 2006. A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC’06). 65--73. Google ScholarDigital Library
U.S. Department of Energy at Pacific Northwest National Laboratory. 2014. GridLAB-D. Retrieved from http://www.gridlabd.org/.Google Scholar
U.S. EPA Fuel Economy Information. 2014. Nissan Leaf. Retrieved from http://www.fueleconomy.gov/feg/Find.do?action=sbs8id=32154.Google Scholar
Konstantina Valogianni, Wolfgang Ketter, and John Collins. 2014. Learning to schedule electric vehicle charging given individual customer preferences. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1591--1592. Google ScholarDigital Library
Stijn Vandael, Nelis Boucké, Tom Holvoet, Klaas De Craemer, and Geert Deconinck. 2011. Decentralized coordination of plug-in hybrid vehicles for imbalance reduction in a smart grid. In Proceedings of he 10th International Conference on Autonomous Agents and Multi-agent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 803--810. Google ScholarDigital Library
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3--4 (1992), 279--292. Google ScholarDigital Library
Peter Whittle. 1951. Hypothesis Testing in Time Series Analysis. Vol. 4. Almqvist 8 Wiksells.Google Scholar
Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 1 (1996), 69--101. Google ScholarDigital Library
Guoqiang Zhang, B. Eddy Patuwo, and Michael Y. Hu. 1998. Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting 14, 1 (1998), 35--62.Google ScholarCross Ref

Index Terms

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

Recommendations

P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments
AAMAS '15: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems

Multi-Agent Reinforcement Learning (MARL) is a widely-used technique for optimization in decentralised control problems, addressing complex challenges when several agents change actions simultaneously and without collaboration. Such challenges are ...
Read More
Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning
Neural Information Processing
Abstract
Multi-agent reinforcement learning methods have shown significant progress, however, they continue to exhibit exploration problems in complex and challenging environments. To address the above issue, current research has introduced several ...
Read More
Learning intelligent behavior in a non-stationary and partially observable environment

Individual learning in an environment where more than one agent exist is a challenging task. In this paper, a single learning agent situated in an environment where multiple agents exist is modeled based on reinforcement learning. The environment is non-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Autonomous and Adaptive Systems Volume 12, Issue 2
June 2017
162 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/3099619
Editors:
Manish Parashar
Rutgers University, USA
,
Franco Zambonelli
University of Modena e Reggio Emilia, Italy
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 May 2017
- Revised: 1 March 2017
- Accepted: 1 March 2017
- Received: 1 July 2015
Published in taas Volume 12, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multi-agent systems
environment prediction
reinforcement learning
smart grids
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 1,055
  Total Downloads
- Downloads (Last 12 months)112
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

ACM Transactions on Autonomous and Adaptive Systems

Abstract

References

Cited By

Index Terms

Recommendations

P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments

Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning

Learning intelligent behavior in a non-stationary and partially observable environment

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

ACM Transactions on Autonomous and Adaptive Systems

Abstract

References

Cited By

Index Terms

Recommendations

P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments

Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning

Learning intelligent behavior in a non-stationary and partially observable environment

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media