当前位置: X-MOL 学术Comput. Chem. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Twin actor twin delayed deep deterministic policy gradient (TATD3) learning for batch process control
Computers & Chemical Engineering ( IF 3.9 ) Pub Date : 2021-09-08 , DOI: 10.1016/j.compchemeng.2021.107527
Tanuja Joshi 1 , Shikhar Makker 1 , Hariprasad Kodamana 1, 2 , Harikumar Kandath 3
Affiliation  

Control of batch processes is a difficult task due to their complex nonlinear dynamics and unsteady-state operating conditions within batch and batch-to-batch. It is expected that some of these challenges can be addressed by developing control strategies that directly interact with the process and learning from experiences. Recent studies in the literature have indicated the advantage of having an ensemble of actors in actor-critic Reinforcement Learning (RL) frameworks for improving the policy. The present study proposes an actor-critic RL algorithm, namely, twin actor twin delayed deep deterministic policy gradient (TATD3), by incorporating twin actor networks in the existing twin-delayed deep deterministic policy gradient (TD3) algorithm for the continuous control. In addition, two types of novel reward functions are also proposed for TATD3 controller. We showcase the efficacy of the TATD3 based controller for various batch process examples by comparing it with some of the existing RL algorithms presented in the literature.



中文翻译:

用于批处理控制的孪生角色孪生延迟深度确定性策略梯度 (TATD3) 学习

批处理过程的控制是一项艰巨的任务,因为它们具有复杂的非线性动力学和批次和批次间的不稳定操作条件。预计其中一些挑战可以通过开发与过程直接相互作用并从经验中学习的控制策略来解决。最近的文献研究表明,在演员评论强化学习 (RL) 框架中拥有一组演员以改进政策的优势。本研究提出了一种actor-critic RL算法,即双actor双延迟深度确定性策略梯度(TATD3),通过在现有的双延迟深度确定性策略梯度(TD3)算法中结合双actor网络进行连续控制。此外,还为 TATD3 控制器提出了两种新颖的奖励函数。我们通过将基于 TATD3 的控制器与文献中提出的一些现有 RL 算法进行比较,展示了基于 TATD3 的控制器对各种批处理示例的功效。

更新日期:2021-10-02
down
wechat
bug