当前位置:
X-MOL 学术
›
arXiv.cs.MA
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adversarial Environment Generation for Learning to Navigate the Web
arXiv - CS - Multiagent Systems Pub Date : 2021-03-02 , DOI: arxiv-2103.01991 Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust
arXiv - CS - Multiagent Systems Pub Date : 2021-03-02 , DOI: arxiv-2103.01991 Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust
Learning to autonomously navigate the web is a difficult sequential decision
making task. The state and action spaces are large and combinatorial in nature,
and websites are dynamic environments consisting of several pages. One of the
bottlenecks of training web navigation agents is providing a learnable
curriculum of training environments that can cover the large variety of
real-world websites. Therefore, we propose using Adversarial Environment
Generation (AEG) to generate challenging web environments in which to train
reinforcement learning (RL) agents. We provide a new benchmarking environment,
gMiniWoB, which enables an RL adversary to use compositional primitives to
learn to generate arbitrarily complex websites. To train the adversary, we
propose a new technique for maximizing regret using the difference in the
scores obtained by a pair of navigator agents. Our results show that our
approach significantly outperforms prior methods for minimax regret AEG. The
regret objective trains the adversary to design a curriculum of environments
that are "just-the-right-challenge" for the navigator agents; our results show
that over time, the adversary learns to generate increasingly complex web
navigation tasks. The navigator agents trained with our technique learn to
complete challenging, high-dimensional web navigation tasks, such as form
filling, booking a flight etc. We show that the navigator agent trained with
our proposed Flexible b-PAIRED technique significantly outperforms competitive
automatic curriculum generation baselines -- including a state-of-the-art RL
web navigation approach -- on a set of challenging unseen test environments,
and achieves more than 80% success rate on some tasks.
中文翻译:
对抗性环境生成,用于学习导航网络
学习自主导航网络是一项困难的顺序决策任务。状态和动作空间很大且本质上是组合的,网站是由多个页面组成的动态环境。培训Web导航代理的瓶颈之一是提供一种可学习的培训环境课程,该课程可以涵盖各种现实世界中的网站。因此,我们建议使用对抗性环境生成(AEG)来生成具有挑战性的Web环境,以在其中训练强化学习(RL)代理。我们提供了一个新的基准测试环境gMiniWoB,它使RL对手可以使用合成原语来学习生成任意复杂的网站。为了训练对手 我们提出了一种新技术,可利用一对导航员获得的得分差异来最大程度地增加后悔。我们的结果表明,我们的方法大大优于以前的方法,使minimax对AEG产生遗憾。遗憾的目标是训练对手设计一套针对航海人员“恰到好处的挑战”的环境课程;我们的结果表明,随着时间的流逝,对手学会了生成越来越复杂的网络导航任务。接受过我们技术培训的导航员可以学习完成具有挑战性的高维度Web导航任务,例如表单填写,预定航班等。
更新日期:2021-03-04
中文翻译:
对抗性环境生成,用于学习导航网络
学习自主导航网络是一项困难的顺序决策任务。状态和动作空间很大且本质上是组合的,网站是由多个页面组成的动态环境。培训Web导航代理的瓶颈之一是提供一种可学习的培训环境课程,该课程可以涵盖各种现实世界中的网站。因此,我们建议使用对抗性环境生成(AEG)来生成具有挑战性的Web环境,以在其中训练强化学习(RL)代理。我们提供了一个新的基准测试环境gMiniWoB,它使RL对手可以使用合成原语来学习生成任意复杂的网站。为了训练对手 我们提出了一种新技术,可利用一对导航员获得的得分差异来最大程度地增加后悔。我们的结果表明,我们的方法大大优于以前的方法,使minimax对AEG产生遗憾。遗憾的目标是训练对手设计一套针对航海人员“恰到好处的挑战”的环境课程;我们的结果表明,随着时间的流逝,对手学会了生成越来越复杂的网络导航任务。接受过我们技术培训的导航员可以学习完成具有挑战性的高维度Web导航任务,例如表单填写,预定航班等。