当前位置: X-MOL 学术Comput. Chem. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Alleviating parameter-tuning burden in reinforcement learning for large-scale process control
Computers & Chemical Engineering ( IF 4.3 ) Pub Date : 2022-01-06 , DOI: 10.1016/j.compchemeng.2022.107658
Lingwei Zhu 1 , Go Takami 2 , Mizuo Kawahara 2 , Hiroaki Kanokogi 2 , Takamitsu Matsubara 1
Affiliation  

Modern process controllers necessitate high quality models and remedial system re-identification upon performance degradation. Reinforcement Learning (RL) can be a promising replacement for those laborious manual procedures. However, in realistic scenarios time is limited, algorithms that can robustly learn with reduced human-agent interactions or self-exploration e.g. parameter tuning are desired. In practice, a great portion of time in setting up an RL algorithm to properly work is spent on those trial-and-error interactions. To reduce the interaction time, we propose a principled framework to ensure monotonic policy improvement even with underperforming parameters, enhancing the robustness of RL process against parameter setting. We incorporate key ingredients such as random features and factorial policy into monotonic improvement mechanism for learning cautiously in large-scale process control problems. We demonstrate in challenging control problems on the simulated vinyl acetate monomer process that the proposed method robustly learns meaningful policy within a short, fixed learning horizon given various parameter configurations that simulate the interactions, comparing to the other method that can only show good performance specific to a narrow range of parameters.



中文翻译:

减轻大规模过程控制强化学习中的参数调整负担

现代过程控制器需要高质量的模型和性能下降时的补救系统重新识别。强化学习 (RL) 可以替代那些费力的手动程序。然而,在现实场景中,时间是有限的,需要能够通过减少人机交互或自我探索(例如参数调整)进行稳健学习的算法。在实践中,将 RL 算法设置为正常工作的大部分时间都花在了这些试错交互上。为了减少交互时间,我们提出了一个原则框架,以确保即使在参数表现不佳的情况下也能改进单调策略,从而增强 RL 过程对参数设置的鲁棒性。我们将随机特征和阶乘策略等关键成分纳入单调改进机制,以便在大规模过程控制问题中谨慎学习。我们在模拟乙酸乙烯酯单体过程的挑战性控制问题中证明,与只能显示特定于参数范围窄。

更新日期:2022-01-11
down
wechat
bug