Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments,ACM Transactions on Autonomous and Adaptive Systems

当前位置： X-MOL 学术 › ACM Trans. Auton. Adapt. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments
ACM Transactions on Autonomous and Adaptive Systems ( IF 2.7 ) Pub Date : 2017-05-25 , DOI: 10.1145/3070861
Andrei Marinescu ₁ , Ivana Dusparic ₁ , Siobhán Clarke ₁

Affiliation

Multi-agent reinforcement learning (MARL) is a widely researched technique for decentralised control in complex large-scale autonomous systems. Such systems often operate in environments that are continuously evolving and where agents’ actions are non-deterministic, so called inherently non-stationary environments. When there are inconsistent results for agents acting on such an environment, learning and adapting is challenging. In this article, we propose P-MARL, an approach that integrates prediction and pattern change detection abilities into MARL and thus minimises the effect of non-stationarity in the environment. The environment is modelled as a time-series, with future estimates provided using prediction techniques. Learning is based on the predicted environment behaviour, with agents employing this knowledge to improve their performance in realtime. We illustrate P-MARL’s performance in a real-world smart grid scenario, where the environment is heavily influenced by non-stationary power demand patterns from residential consumers. We evaluate P-MARL in three different situations, where agents’ action decisions are independent, simultaneous, and sequential. Results show that all methods outperform traditional MARL, with sequential P-MARL achieving best results.

中文翻译：

固有非平稳环境中基于预测的多智能体强化学习

多智能体强化学习（MARL）是一种广泛研究的用于复杂大规模自治系统中的分散控制的技术。此类系统通常在不断发展的环境中运行，并且代理的行为是不确定的，即所谓的固有非平稳环境。当代理在这样的环境中行动的结果不一致时，学习和适应是具有挑战性的。在本文中，我们提出了 P-MARL，一种将预测和模式变化检测能力集成到 MARL 中的方法，从而最大限度地减少环境中非平稳性的影响。环境被建模为时间序列，并使用预测技术提供未来估计。学习基于预测的环境行为，代理利用这些知识来实时提高他们的表现。我们说明了 P-MARL 在现实世界智能电网场景中的表现，其中环境受到住宅消费者非固定电力需求模式的严重影响。我们在三种不同的情况下评估 P-MARL，其中代理的行动决策是独立的、同时的和连续的。结果表明，所有方法都优于传统的 MARL，顺序 P-MARL 取得了最佳结果。

更新日期：2017-05-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>