当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deliberative acting, planning and learning with hierarchical operational models
Artificial Intelligence ( IF 14.4 ) Pub Date : 2021-05-05 , DOI: 10.1016/j.artint.2021.103523
Sunandita Patra , James Mason , Malik Ghallab , Dana Nau , Paolo Traverso

In AI research, synthesizing a plan of action has typically used descriptive models of the actions that abstractly specify what might happen as a result of an action, and are tailored for efficiently computing state transitions. However, executing the planned actions has needed operational models, in which rich computational control structures and closed-loop online decision-making are used to specify how to perform an action in a nondeterministic execution context, react to events and adapt to an unfolding situation. Deliberative actors, which integrate acting and planning, have typically needed to use both of these models together—which causes problems when attempting to develop the different models, verify their consistency, and smoothly interleave acting and planning.

As an alternative, we define and implement an integrated acting and planning system in which both planning and acting use the same operational models. These rely on hierarchical task-oriented refinement methods offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system. At each decision step, RAE can get advice from a planner for a near-optimal choice with respect to an utility function. The anytime planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM, whose rollouts are simulations of the actor's operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. We demonstrate the asymptotic convergence of UPOM towards optimal methods in static domains, and show experimentally that UPOM and the learning strategies significantly improve the acting efficiency and robustness.



中文翻译:

具有分层操作模型的协商性行动,计划和学习

在人工智能研究,综合行动计划已通常用于描述性模型是抽象的指定动作的什么可能发生,因为诉讼的结果,并专为高效计算状态转换。但是,执行计划的动作需要操作模型,其中丰富的计算控制结构和闭环在线决策用于指定如何在不确定的执行上下文中执行动作,对事件做出反应并适应不断发展的情况。协商参与者整合了行动和计划的,通常需要同时使用这两个模型,这在尝试开发不同的模型,验证其一致性并顺利地交错行动和计划时会引起问题。

作为替代方案,我们定义并实施一个集成的演艺和计划系统,在该系统中,计划和演艺都使用相同的运营模型。这些依赖于分层的,面向任务的优化方法,这些方法可提供丰富的控制结构。称为“反应式代理引擎”(RAE)的代理组件受到著名的PRS系统的启发。在每个决策步骤中,RAE都可以从计划者那里获得建议,以针对效用函数进行近乎最佳的选择。随时计划者都使用类似UCT的蒙特卡洛树搜索程序,称为UPOM,其展开是演员操作模型的模拟。我们还提出了用于RAEUPOM的学习策略它从在线行动经验和/或模拟的计划结果中获取从决策上下文到方法实例的映射以及用于指导UPOM的启发式功能。我们证明了UPOM在静态域中朝向最优方法的渐近收敛性,并通过实验证明了UPOM和学习策略显着提高了执行效率和鲁棒性。

更新日期:2021-05-12
down
wechat
bug