当前位置: X-MOL 学术Comput. Chem. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hierarchical constrained reinforcement learning for optimization of bitumen recovery rate in a primary separation vessel
Computers & Chemical Engineering ( IF 4.3 ) Pub Date : 2020-05-29 , DOI: 10.1016/j.compchemeng.2020.106939
Hareem Shafi , Kirubakaran Velswamy , Fadi Ibrahim , Biao Huang

This work proposes a two-level hierarchical constrained control structure for reinforcement learning (RL) with application in a Primary Separation Vessel (PSV). The lower level is concerned with servo tracking and regulation of the interface level against variances in ore quality by manipulating middlings flow rate. At the higher level, with the objective to optimize bitumen recovery rate, a supervisory interface level setpoint control is implemented. To prevent sanding, tailings density regulation using tailings withdrawal flow rate is proposed. For each case, an asynchronous advantage actor-critic (A3C) based agent is chosen to interact with a high-fidelity PSV model to learn the near optimal control strategy through episodic interactions. Each of the three control loops is sequentially learnt. In the interface level control loop, a behavioral cloning based two-phase learning scheme to promote stable state space exploration is proposed. The proposed hierarchical structure successfully demonstrates improved bitumen recovery rate by manipulating the interface level while preventing sanding.



中文翻译:

用于优化初级分离容器中沥青回收率的分层约束强化学习

这项工作提出了一种用于强化学习(RL)的两级分层约束控制结构,并应用于主分离船(PSV)。较低的级别与伺服跟踪有关,并且通过控制中间流率来调节界面级别以防止矿石质量变化。在更高级别上,以优化沥青回收率为目标,实施了监控界面级别设定点控制。为了防止砂磨,提出了利用尾矿抽出流量调节尾矿密度的方法。对于每种情况,都选择基于异步优势行为者(A3C)的代理与高保真PSV模型进行交互,以通过情节交互来学习近乎最优的控制策略。依次学习三个控制回路中的每个回路。在接口级别控制循环中,提出了一种基于行为克隆的两阶段学习方案,以促进稳态空间探索。所提出的分层结构通过在防止打磨的同时处理界面水平,成功地展示了改善的沥青回收率。

更新日期:2020-05-29
down
wechat
bug