Online Multi-agent Reinforcement Learning for Decentralized Inverter-based Volt-VAR Control,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online Multi-agent Reinforcement Learning for Decentralized Inverter-based Volt-VAR Control
arXiv - CS - Multiagent Systems Pub Date : 2020-06-23 , DOI: arxiv-2006.12841
Haotian Liu, Wenchuan Wu

The distributed Volt/Var control (VVC) methods have been widely studied for active distribution networks(ADNs), which is based on perfect model and real-time P2P communication. However, the model is always incomplete with significant parameter errors and such P2P communication system is hard to maintain. In this paper, we propose an online multi-agent reinforcement learning and decentralized control framework (OLDC) for VVC. In this framework, the VVC problem is formulated as a constrained Markov game and we propose a novel multi-agent constrained soft actor-critic (MACSAC) reinforcement learning algorithm. MACSAC is used to train the control agents online, so the accurate ADN model is no longer needed. Then, the trained agents can realize decentralized optimal control using local measurements without real-time P2P communication. The OLDC with MACSAC has shown extraordinary flexibility, efficiency and robustness to various computing and communication conditions. Numerical simulations on IEEE test cases not only demonstrate that the proposed MACSAC outperforms the state-of-art learning algorithms, but also support the superiority of our OLDC framework in the online application.

中文翻译：

基于分散式逆变器的 Volt-VAR 控制的在线多智能体强化学习

基于完美模型和实时 P2P 通信的有源配电网 (ADN) 已广泛研究了分布式电压/无功控制 (VVC) 方法。然而，模型总是不完整的，参数错误很大，这样的 P2P 通信系统很难维护。在本文中，我们为 VVC 提出了一种在线多智能体强化学习和分散控制框架（OLDC）。在这个框架中，VVC 问题被表述为一个受约束的马尔可夫博弈，我们提出了一种新颖的多智能体约束软演员-评论家 (MACSAC) 强化学习算法。MACSAC 用于在线训练控制代理，因此不再需要精确的 ADN 模型。然后，经过训练的代理可以使用本地测量实现分散优化控制，无需实时 P2P 通信。带有 MACSAC 的 OLDC 对各种计算和通信条件表现出非凡的灵活性、效率和稳健性。IEEE 测试用例的数值模拟不仅表明所提出的 MACSAC 优于最先进的学习算法，而且还支持我们的 OLDC 框架在在线应用中的优越性。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文