Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network,Entropy

当前位置： X-MOL 学术 › Entropy › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network
Entropy ( IF 2.7 ) Pub Date : 2020-05-18 , DOI: 10.3390/e22050564
Takazumi Matsumoto , Jun Tani

It is crucial to ask how agents can achieve goals by generating action plans using only partial models of the world acquired through habituated sensory-motor experiences. Although many existing robotics studies use a forward model framework, there are generalization issues with high degrees of freedom. The current study shows that the predictive coding (PC) and active inference (AIF) frameworks, which employ a generative model, can develop better generalization by learning a prior distribution in a low dimensional latent state space representing probabilistic structures extracted from well habituated sensory-motor trajectories. In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound, while goal-directed planning is accomplished by inferring latent variables for maximizing the estimated lower bound. Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data by setting an intermediate value for a regularization coefficient. Furthermore, comparative simulation results show that the proposed model outperforms a conventional forward model in goal-directed planning, due to the learned prior confining the search of motor plans within the range of habituated trajectories.

中文翻译：

使用变分循环神经网络通过主动推理对习惯代理进行目标导向规划

询问智能体如何通过仅使用通过习惯性感觉运动体验获得的部分世界模型生成行动计划来实现目标，这一点至关重要。尽管许多现有的机器人研究使用前向模型框架，但存在高自由度的泛化问题。目前的研究表明，采用生成模型的预测编码 (PC) 和主动推理 (AIF) 框架可以通过学习低维潜在状态空间中的先验分布来开发更好的泛化能力，该分布代表从良好习惯的感官中提取的概率结构。运动轨迹。在我们提出的模型中，学习是通过推断最佳潜在变量以及突触权重来实现的，以最大化证据下界，而目标导向的规划是通过推断潜在变量以最大化估计的下限来完成的。我们提出的模型在模拟中用简单和复杂的机器人任务进行了评估，通过为正则化系数设置中间值，证明了在有限训练数据的学习中具有足够的泛化性。此外，比较模拟结果表明，由于学习的先验将运动计划的搜索限制在习惯轨迹范围内，因此所提出的模型在目标导向规划方面优于传统的前向模型。通过为正则化系数设置中间值，证明了在有限训练数据的学习中具有足够的泛化能力。此外，比较模拟结果表明，由于学习的先验将运动计划的搜索限制在习惯轨迹范围内，因此所提出的模型在目标导向规划方面优于传统的前向模型。通过为正则化系数设置中间值，证明了在有限训练数据的学习中具有足够的泛化能力。此外，比较模拟结果表明，由于学习的先验将运动计划的搜索限制在习惯轨迹范围内，因此所提出的模型在目标导向规划方面优于传统的前向模型。

更新日期：2020-05-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>