当前位置: X-MOL 学术Journal of Modern Power Systems and Clean Energy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximating Nash Equilibrium in Day-Ahead Electricity Market Bidding with Multi-Agent Deep Reinforcement Learning
Journal of Modern Power Systems and Clean Energy ( IF 5.7 ) Pub Date : 2021-04-19 , DOI: 10.35833/mpce.2020.000502
Yan Du , Fangxing Li , Helia Zandi , Yaosuo Xue

In this paper, a day-ahead electricity market bidding problem with multiple strategic generation company (GEN-CO) bidders is studied. The problem is formulated as a Markov game model, where GENCO bidders interact with each other todevelop their optimal day-ahead bidding strategies. Considering unobservable information in the problem, a model-free and data-driven approach, known as multi-agent deep deterministic policy gradient (MADDPG), is applied for approximating the Nash equilibrium (NE) in the above Markov game. The MADDPG algorithm has the advantage of generalization due to the automatic feature extraction ability of the deep neural networks. The algorithm is tested on an IEEE 30-bus system with three competitive GENCO bidders in both an uncongested caseand a congested case. Comparisons with a truthful bidding strategy and state-of-the-art deep reinforcement learning methods including deep Q network and deep deterministic policy gradient (DDPG) demonstrate that the applied MADDPG algorithm can find a superior bidding strategy for all the market participants with increased profit gains. In addition, the comparison with a conventional model-based method shows that the MADDPG algorithm has higher computational efficiency, which is feasible for real-world applications.

中文翻译:

利用多智能体深度强化学习在日前电力市场竞标中逼近纳什均衡

本文研究了多战略发电公司(GEN-CO)投标人的提前电力市场投标问题。该问题被公式化为马尔可夫博弈模型,GENCO竞标者彼此交互以开发其最佳的日前竞标策略。考虑到问题中无法观察到的信息,在上述马尔可夫博弈中,采用了一种无模型且数据驱动的方法,称为多主体深度确定性策略梯度(MADDPG),以近似Nash均衡(NE)。由于深度神经网络具有自动特征提取能力,因此MADDPG算法具有泛化的优势。在非拥挤情况和拥挤情况下,该算法均在具有30个竞争性GENCO投标人的IEEE 30总线系统上进行了测试。通过与真实的出价策略以及包括深度Q网络和深度确定性策略梯度(DDPG)在内的最新深度强化学习方法的比较表明,所应用的MADDPG算法可以为所有获利增加的市场参与者找到更好的出价策略收获。另外,与传统的基于模型的方法的比较表明,MADDPG算法具有更高的计算效率,这对于实际应用是可行的。
更新日期:2021-04-20
down
wechat
bug