Deep active inference as variational policy gradients,Journal of Mathematical Psychology

当前位置： X-MOL 学术 › J. Math. Psychol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep active inference as variational policy gradients
Journal of Mathematical Psychology ( IF 2.2 ) Pub Date : 2020-06-01 , DOI: 10.1016/j.jmp.2020.102348
Beren Millidge

Active Inference is a theory of action arising from neuroscience which casts action and planning as a bayesian inference problem to be solved by minimizing a single quantity - the variational free energy. Active Inference promises a unifying account of action and perception coupled with a biologically plausible process theory. Despite these potential advantages, current implementations of Active Inference can only handle small, discrete policy and state-spaces and typically require the environmental dynamics to be known. In this paper we propose a novel deep Active Inference algorithm which approximates key densities using deep neural networks as flexible function approximators, which enables Active Inference to scale to significantly larger and more complex tasks. We demonstrate our approach on a suite of OpenAIGym benchmark tasks and obtain performance comparable with common reinforcement learning baselines. Moreover, our algorithm shows similarities with maximum entropy reinforcement learning and the policy gradients algorithm, which reveals interesting connections between the Active Inference framework and reinforcement learning.

中文翻译：

作为变分策略梯度的深度主动推理

主动推理是一种源自神经科学的行动理论，它将行动和计划作为贝叶斯推理问题，通过最小化单个量 - 变分自由能来解决。主动推理承诺对行为和感知的统一解释以及生物学上合理的过程理论。尽管有这些潜在的优势，主动推理的当前实现只能处理小的、离散的策略和状态空间，并且通常需要知道环境动态。在本文中，我们提出了一种新颖的深度主动推理算法，该算法使用深度神经网络作为灵活的函数逼近器来逼近关键密度，从而使主动推理能够扩展到显着更大和更复杂的任务。我们在一套 OpenAIGym 基准测试任务上展示了我们的方法，并获得了与常见强化学习基线相当的性能。此外，我们的算法显示出与最大熵强化学习和策略梯度算法的相似之处，这揭示了主动推理框架和强化学习之间的有趣联系。

更新日期：2020-06-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11