当前位置: X-MOL 学术Neural Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Active Inference: Demystified and Compared
Neural Computation ( IF 2.9 ) Pub Date : 2021-01-05 , DOI: 10.1162/neco_a_01357
Noor Sajid 1 , Philip J Ball 2 , Thomas Parr 1 , Karl J Friston 1
Affiliation  

Active inference is a first principle account of how autonomous agents operate in dynamic, nonstationary environments. This problem is also considered in reinforcement learning, but limited work exists on comparing the two approaches on the same discrete-state environments. In this letter, we provide (1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in reinforcement learning, and (2) an explicit discrete-state comparison between active inference and reinforcement learning on an OpenAI gym baseline. We begin by providing a condensed overview of the active inference literature, in particular viewing the various natural behaviors of active inference agents through the lens of reinforcement learning. We show that by operating in a pure belief-based setting, active inference agents can carry out epistemic exploration—and account for uncertainty about their environment—in a Bayes-optimal fashion. Furthermore, we show that the reliance on an explicit reward signal in reinforcement learning is removed in active inference, where reward can simply be treated as another observation we have a preference over; even in the total absence of rewards, agent behaviors are learned through preference learning. We make these properties explicit by showing two scenarios in which active inference agents can infer behaviors in reward-free environments compared to both Q-learning and Bayesian model-based reinforcement learning agents and by placing zero prior preferences over rewards and learning the prior preferences over the observations corresponding to reward. We conclude by noting that this formalism can be applied to more complex settings (e.g., robotic arm movement, Atari games) if appropriate generative models can be formulated. In short, we aim to demystify the behavior of active inference agents by presenting an accessible discrete state-space and time formulation and demonstrate these behaviors in a OpenAI gym environment, alongside reinforcement learning agents.

中文翻译:

主动推理:揭秘和比较

主动推理是自主代理如何在动态、非平稳环境中运行的首要原则。在强化学习中也考虑了这个问题,但在相同的离散状态环境中比较这两种方法的工作有限。在这封信中,我们提供了 (1) 主动推理的离散状态公式的可访问概述,突出了通常在强化学习中设计的主动推理中的自然行为,以及 (2) 主动推理和主动推理之间的显式离散状态比较基于 OpenAI 健身房基线的强化学习。我们首先提供对主动推理文献的简要概述,特别是通过强化学习的视角观察主动推理代理的各种自然行为。我们表明,通过在纯基于信念的环境中运行,主动推理代理可以以贝叶斯最优方式进行认知探索,并解释其环境的不确定性。此外,我们表明在主动推理中消除了强化学习中对显式奖励信号的依赖,其中奖励可以简单地视为我们更喜欢的另一种观察;即使在完全没有奖励的情况下,代理行为也是通过偏好学习来学习的。我们通过展示两种场景,与 Q-learning 和基于贝叶斯模型的强化学习代理相比,主动推理代理可以推断无奖励环境中的行为,并通过将零先验偏好置于奖励之上并学习先验偏好与奖励相对应的观察结果。我们总结指出,如果可以制定适当的生成模型,这种形式主义可以应用于更复杂的设置(例如,机械臂运动、Atari 游戏)。简而言之,我们的目标是通过呈现一个可访问的离散状态空间和时间公式来揭开主动推理代理的行为的神秘面纱,并在 OpenAI 健身房环境中展示这些行为,以及强化学习代理。
更新日期:2021-01-05
down
wechat
bug