Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales,Cognitive, Affective, & Behavioral Neuroscience

当前位置： X-MOL 学术 › Cogn. Affect. Behav. Neurosci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales
Cognitive, Affective, & Behavioral Neuroscience ( IF 2.5 ) Pub Date : 2020-12-28 , DOI: 10.3758/s13415-020-00837-x
Dimitrije Marković ₁ , Thomas Goschke _{2,

3} , Stefan J Kiebel _{1,

3}

Affiliation

Cognitive control is typically understood as a set of mechanisms that enable humans to reach goals that require integrating the consequences of actions over longer time scales. Importantly, using routine behaviour or making choices beneficial only at short time scales would prevent one from attaining these goals. During the past two decades, researchers have proposed various computational cognitive models that successfully account for behaviour related to cognitive control in a wide range of laboratory tasks. As humans operate in a dynamic and uncertain environment, making elaborate plans and integrating experience over multiple time scales is computationally expensive. Importantly, it remains poorly understood how uncertain consequences at different time scales are integrated into adaptive decisions. Here, we pursue the idea that cognitive control can be cast as active inference over a hierarchy of time scales, where inference, i.e., planning, at higher levels of the hierarchy controls inference at lower levels. We introduce the novel concept of meta-control states, which link higher-level beliefs with lower-level policy inference. Specifically, we conceptualize cognitive control as inference over these meta-control states, where solutions to cognitive control dilemmas emerge through surprisal minimisation at different hierarchy levels. We illustrate this concept using the exploration-exploitation dilemma based on a variant of a restless multi-armed bandit task. We demonstrate that beliefs about contexts and meta-control states at a higher level dynamically modulate the balance of exploration and exploitation at the lower level of a single action. Finally, we discuss the generalisation of this meta-control concept to other control dilemmas.

中文翻译：

探索-利用困境的元控制源于时间尺度层次结构的概率推理

认知控制通常被理解为一组机制，使人类能够实现需要在较长时间范围内整合行动后果的目标。重要的是，使用常规行为或做出仅在短时间内有益的选择会阻碍人们实现这些目标。在过去的二十年中，研究人员提出了各种计算认知模型，成功地解释了各种实验室任务中与认知控制相关的行为。由于人类在动态和不确定的环境中工作，制定周密的计划并整合多个时间尺度的经验在计算上是昂贵的。重要的是，人们对如何将不同时间尺度的不确定后果整合到适应性决策中仍然知之甚少。在这里，我们追求这样的想法：认知控制可以被视为时间尺度层次结构上的主动推理，其中层次结构较高级别的推理（即计划）控制较低级别的推理。我们引入了元控制状态的新概念，它将更高层次的信念与较低层次的政策推理联系起来。具体来说，我们将认知控制概念化为对这些元控制状态的推断，其中认知控制困境的解决方案通过不同层级的意外最小化而出现。我们使用基于不安定多臂老虎机任务变体的探索-利用困境来说明这个概念。我们证明，关于较高级别的上下文和元控制状态的信念可以动态调节单个操作的较低级别的探索和利用的平衡。最后，我们讨论了这种元控制概念对其他控制困境的推广。

更新日期：2020-12-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文