当前位置: X-MOL 学术Comput. Chem. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-step lookahead Bayesian optimization with active learning using reinforcement learning and its application to data-driven batch-to-batch optimization
Computers & Chemical Engineering ( IF 4.3 ) Pub Date : 2022-09-07 , DOI: 10.1016/j.compchemeng.2022.107987
Ha-Eun Byun , Boeun Kim , Jay H. Lee

This study presents a novel multi-step lookahead Bayesian optimization method which strives for optimal active learning by balancing exploration and exploitation over multiple future sampling-evaluation trials. The approach adopts a Gaussian process (GP) model to represent the underlying function, which is updated after each sampling and evaluation. Then, a reinforcement learning method of Proximal Policy Optimization (PPO) is used to locate the next optimal point to sample while considering multiple future such trials using the current GP model as the fictitious environment. The approach is applied to batch-to-batch (B2B) optimization where an optimal batch recipe is searched for without any process knowledge. The B2B optimization is formulated as a partially observable Markov decision process (POMDP) problem, and GP model learning and policy learning through PPO are iteratively performed to suggest the next batch recipe. The effectiveness of the approach in the B2B optimization problem is demonstrated through two case studies.



中文翻译:

使用强化学习进行主动学习的多步前瞻贝叶斯优化及其在数据驱动的批量优化中的应用

本研究提出了一种新颖的多步前瞻贝叶斯优化方法,该方法通过在多个未来的采样评估试验中平衡探索和利用来争取最佳的主动学习。该方法采用高斯过程(GP)模型来表示底层函数,在每次采样和评估后更新。然后,在考虑使用当前 GP 模型作为虚拟环境的多个未来此类试验的同时,使用近端策略优化 (PPO) 的强化学习方法来定位下一个最佳采样点。该方法适用于批次间 (B2B) 优化,其中无需任何工艺知识即可搜索最佳批次配方。B2B 优化被表述为一个部分可观察的马尔可夫决策过程 (POMDP) 问题,并且通过 PPO 迭代执行 GP 模型学习和策略学习,以建议下一批配方。通过两个案例研究证明了该方法在 B2B 优化问题中的有效性。

更新日期:2022-09-07
down
wechat
bug