当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data-Efficient Reinforcement Learning for Malaria Control
arXiv - CS - Artificial Intelligence Pub Date : 2021-05-04 , DOI: arxiv-2105.01620
Lixin Zou, Long Xia, Linfang Hou, Xiangyu Zhao, Dawei Yin

Sequential decision-making under cost-sensitive tasks is prohibitively daunting, especially for the problem that has a significant impact on people's daily lives, such as malaria control, treatment recommendation. The main challenge faced by policymakers is to learn a policy from scratch by interacting with a complex environment in a few trials. This work introduces a practical, data-efficient policy learning method, named Variance-Bonus Monte Carlo Tree Search~(VB-MCTS), which can copy with very little data and facilitate learning from scratch in only a few trials. Specifically, the solution is a model-based reinforcement learning method. To avoid model bias, we apply Gaussian Process~(GP) regression to estimate the transitions explicitly. With the GP world model, we propose a variance-bonus reward to measure the uncertainty about the world. Adding the reward to the planning with MCTS can result in more efficient and effective exploration. Furthermore, the derived polynomial sample complexity indicates that VB-MCTS is sample efficient. Finally, outstanding performance on a competitive world-level RL competition and extensive experimental results verify its advantage over the state-of-the-art on the challenging malaria control task.

中文翻译:

疟疾控制的数据有效强化学习

在对成本敏感的任务下进行顺序决策令人望而生畏,特别是对于那些对人们的日常生活产生重大影响的问题,例如疟疾控制,治疗建议。决策者面临的主要挑战是在几次试验中通过与复杂环境进行交互来从头学习策略。这项工作介绍了一种实用的,数据有效的策略学习方法,称为Variance-Bonus蒙特卡洛树搜索(VB-MCTS),该方法可以复制很少的数据,并且仅需进行几次试验就可以帮助从头开始学习。具体而言,该解决方案是基于模型的强化学习方法。为了避免模型偏差,我们应用高斯过程〜(GP)回归来显式估计过渡。利用GP世界模型,我们提出了方差奖励来衡量关于世界的不确定性。通过MCTS为计划增加奖励,可以提高探索效率。此外,得出的多项式样本复杂度表明VB-MCTS是样本有效的。最后,在具有竞争性的世界级RL竞赛中的出色表现和广泛的实验结果证明,其在具有挑战性的疟疾控制任务方面具有优于最新技术的优势。
更新日期:2021-05-05
down
wechat
bug