当前位置: X-MOL 学术Comput. Chem. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation
Computers & Chemical Engineering ( IF 3.9 ) Pub Date : 2020-10-30 , DOI: 10.1016/j.compchemeng.2020.107133
Haeun Yoo , Boeun Kim , Jong Woo Kim , Jay H. Lee

Batch process control represents a challenge given its dynamic operation over a large operating envelope. Nonlinear model predictive control (NMPC) is the current standard for optimal control of batch processes. The performance of conventional NMPC can be unsatisfactory in the presence of uncertainties. Reinforcement learning (RL) which can utilize simulation or real operation data is a viable alternative for such problems. To apply RL to batch process control effectively, however, choices such as the reward function design and value update method must be made carefully. This study proposes a phase segmentation approach for the reward function design and value/policy function representation. In addition, the deep deterministic policy gradient algorithm (DDPG) is modified with Monte-Carlo learning to ensure more stable and efficient learning behavior. A case study of a batch polymerization process producing polyols is used to demonstrate the improvement brought by the proposed approach and to highlight further issues.



中文翻译:

基于强化学习的蒙特卡洛深度确定性策略梯度与阶段分割对批处理的最佳控制

批处理控制由于在较大的运行范围内动态运行而构成了挑战。非线性模型预测控制(NMPC)是批处理过程最佳控制的当前标准。在存在不确定性的情况下,常规NMPC的性能可能无法令人满意。可以利用模拟或实际操作数据的强化学习(RL)是解决此类问题的可行选择。为了将RL有效地应用于批处理过程控制,必须谨慎选择诸如奖励功能设计和价值更新方法之类的选择。这项研究提出了一种用于奖励功能设计和价值/政策功能表示的阶段分割方法。此外,深度确定性策略梯度算法(DDPG)通过蒙特卡洛学习进行了修改,以确保更稳定,更有效的学习行为。对生产多元醇的间歇聚合工艺进行了案例研究,以证明所提出的方法带来的改进并突出了其他问题。

更新日期:2020-11-12
down
wechat
bug