当前位置: X-MOL 学术IEEE Trans. Smart. Grid. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Feasibility Guarantees of Deep Reinforcement Learning Solutions for Distribution System Operation
IEEE Transactions on Smart Grid ( IF 9.6 ) Pub Date : 2023-01-02 , DOI: 10.1109/tsg.2022.3233709
Mohammad Mehdi Hosseini 1 , Masood Parvania 1
Affiliation  

Deep reinforcement learning (DRL) has scored unprecedented success in finding near-optimal solutions in high-dimensional stochastic problems, leading to its extensive use in operational research, including the operation of power systems. However, in practice, it has been adopted with extreme caution because the standard DRL does not guarantee the satisfaction of operational constraints. In this paper, the feasibility of solutions given by a DRL agent trained to operate a distribution system is guaranteed by modifying the exploration process and optimality criterion of standard DRL. To that end, first, a convex feasibility set in the form of a multi-dimensional polyhedron, called feasibility diamond, is formed inside the region defined by the power flow constraints, using which the feasibility of solutions given by DRL is checked in real-time. Solutions outside the feasibility diamond are projected on the diamond’s surface, and the new modified action is used for DRL training. Further, the distance of infeasible solutions to their feasible projection is penalized in the DRL reward function. The impact of the proposed method on the feasibility and optimality of DRL solutions are tested on three test distribution systems, indicating that modifying the exploration process and a soft penalization of infeasibilities works best in achieving near-optimal and reliable DRL-trained operators.

中文翻译:

论配电系统运行的深度强化学习方案的可行性保证

深度强化学习 (DRL) 在寻找高维随机问题的近似最优解方面取得了前所未有的成功,导致其在运筹学中得到广泛应用,包括电力系统的运行。然而,在实践中,它被极其谨慎地采用,因为标准 DRL 不能保证操作约束的满足。在本文中,通过修改标准 DRL 的探索过程和最优性准则,保证了由训练有素的运行配电系统的 DRL 智能体给出的解决方案的可行性。为此,首先,在由潮流约束定义的区域内形成一个多维多面体形式的凸可行性集,称为可行性菱形,使用它来实际检查 DRL 给出的解决方案的可行性。时间。将可行性菱形外的解投影到菱形表面,新修改的动作用于DRL训练。此外,不可行解与其可行投影的距离在 DRL 奖励函数中受到惩罚。所提出的方法对 DRL 解决方案的可行性和最优性的影响在三个测试分布系统上进行了测试,表明修改探索过程和对不可行性的软惩罚在实现接近最优和可靠的 DRL 训练的操作员方面效果最好。
更新日期:2023-01-02
down
wechat
bug