当前位置: X-MOL 学术Comput. Chem. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Constrained model-free reinforcement learning for process optimization
Computers & Chemical Engineering ( IF 4.3 ) Pub Date : 2021-07-29 , DOI: 10.1016/j.compchemeng.2021.107462
Elton Pan 1 , Panagiotis Petsagkourakis 2 , Max Mowbray 3 , Dongda Zhang 3 , Ehecatl Antonio del Rio-Chanona 1
Affiliation  

Reinforcement learning (RL) is a control approach that can handle nonlinear stochastic optimal control problems. However, despite the promise exhibited, RL has yet to see marked translation to industrial practice primarily due to its inability to satisfy state constraints. In this work we aim to address this challenge. We propose an “oracle”-assisted constrained Q-learning algorithm that guarantees the satisfaction of joint chance constraints with a high probability, which is crucial for safety critical tasks. To achieve this, constraint tightening (backoffs) are introduced and adjusted using Broyden’s method, hence making the backoffs self-tuned. This results in a methodology that can be imbued into RL algorithms to ensure constraint satisfaction. We analyze the performance of the proposed approach and compare against nonlinear model predictive control (NMPC). The favorable performance of this algorithm signifies a step towards the incorporation of RL into real world optimization and control of engineering systems, where constraints are essential.



中文翻译:

用于流程优化的无约束模型强化学习

强化学习 (RL) 是一种控制方法,可以处理非线性随机最优控制问题。然而,尽管展示了承诺,但 RL 还没有看到明显的转化为工业实践,主要是因为它无法满足国家的限制。在这项工作中,我们旨在应对这一挑战。我们提出了一种“oracle”辅助的约束 Q 学习算法,它保证以高概率满足联合机会约束,这对于安全关键任务至关重要。为了实现这一点,使用 Broyden 方法引入和调整约束收紧(退避),从而使退避自调整。这产生了一种可以融入 RL 算法以确保约束满足的方法。我们分析了所提出方法的性能,并与非线性模型预测控制 (NMPC) 进行了比较。该算法的良好性能意味着将 RL 纳入现实世界的优化和工程系统控制,其中约束是必不可少的。

更新日期:2021-08-15
down
wechat
bug