当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adversarial Environment Generation for Learning to Navigate the Web
arXiv - CS - Multiagent Systems Pub Date : 2021-03-02 , DOI: arxiv-2103.01991
Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust

Learning to autonomously navigate the web is a difficult sequential decision making task. The state and action spaces are large and combinatorial in nature, and websites are dynamic environments consisting of several pages. One of the bottlenecks of training web navigation agents is providing a learnable curriculum of training environments that can cover the large variety of real-world websites. Therefore, we propose using Adversarial Environment Generation (AEG) to generate challenging web environments in which to train reinforcement learning (RL) agents. We provide a new benchmarking environment, gMiniWoB, which enables an RL adversary to use compositional primitives to learn to generate arbitrarily complex websites. To train the adversary, we propose a new technique for maximizing regret using the difference in the scores obtained by a pair of navigator agents. Our results show that our approach significantly outperforms prior methods for minimax regret AEG. The regret objective trains the adversary to design a curriculum of environments that are "just-the-right-challenge" for the navigator agents; our results show that over time, the adversary learns to generate increasingly complex web navigation tasks. The navigator agents trained with our technique learn to complete challenging, high-dimensional web navigation tasks, such as form filling, booking a flight etc. We show that the navigator agent trained with our proposed Flexible b-PAIRED technique significantly outperforms competitive automatic curriculum generation baselines -- including a state-of-the-art RL web navigation approach -- on a set of challenging unseen test environments, and achieves more than 80% success rate on some tasks.

中文翻译:

对抗性环境生成,用于学习导航网络

学习自主导航网络是一项困难的顺序决策任务。状态和动作空间很大且本质上是组合的,网站是由多个页面组成的动态环境。培训Web导航代理的瓶颈之一是提供一种可学习的培训环境课程,该课程可以涵盖各种现实世界中的网站。因此,我们建议使用对抗性环境生成(AEG)来生成具有挑战性的Web环境,以在其中训练强化学习(RL)代理。我们提供了一个新的基准测试环境gMiniWoB,它使RL对手可以使用合成原语来学习生成任意复杂的网站。为了训练对手 我们提出了一种新技术,可利用一对导航员获得的得分差异来最大程度地增加后悔。我们的结果表明,我们的方法大大优于以前的方法,使minimax对AEG产生遗憾。遗憾的目标是训练对手设计一套针对航海人员“恰到好处的挑战”的环境课程;我们的结果表明,随着时间的流逝,对手学会了生成越来越复杂的网络导航任务。接受过我们技术培训的导航员可以学习完成具有挑战性的高维度Web导航任务,例如表单填写,预定航班等。
更新日期:2021-03-04
down
wechat
bug