当前位置: X-MOL 学术bioRxiv. Anim. Behav. Cognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Implicit counterfactual effect in partial feedback reinforcement learning: behavioral and modeling approach
bioRxiv - Animal Behavior and Cognition Pub Date : 2020-09-30 , DOI: 10.1101/2020.09.30.320135
Zahra Barakchian , Abdol-hossein Vahabie , Majid Nili Ahmadabadi

Context by distorting values of options with respect to the distribution of available alternatives, remarkably affects learning behavior. Providing an explicit counterfactual component, outcome of unchosen option alongside with the chosen one (Complete feedback), would increase the contextual effect by inducing comparison-based strategy during learning. But It is not clear in the conditions where the context consists only of the juxtaposition of a series of options, and there is no such explicit counterfactual component (Partial feedback), whether and how the relativity will be emerged. Here for investigating whether and how implicit and explicit counterfactual components can affect reinforcement learning, we used two Partial and Complete feedback paradigms, in which options were associated with some reward distributions. Our modeling analysis illustrates that the model which uses the outcome of chosen option for updating values of both chosen and unchosen options, which is in line with diffusive function of dopamine on the striatum, can better account for the behavioral data. We also observed that size of this bias depends on the involved systems in the brain, such that this effect is larger in the transfer phase where subcortical systems are more involved, and is smaller in the deliberative value estimation phase where cortical system is more needed. Furthermore, our data shows that contextual effect is not only limited to probabilistic reward but also it extends to reward with amplitude. These results show that by extending counterfactual concept, we can better account for why there is contextual effect in a condition where there is no extra information of unchosen outcome.

中文翻译:

部分反馈强化学习中的隐性反事实效应:行为和建模方法

通过使选项的价值相对于可用替代方案的分布失真,上下文会显着影响学习行为。提供明确的反事实成分,未选择的选项的结果以及所选择的选项(完全反馈),将通过在学习过程中引入基于比较的策略来增加上下文效果。但是,在上下文仅由一系列选项的并置组成的条件下,以及是否以及如何产生相对性,没有这样明确的反事实成分(部分反馈)的情况尚不清楚。在这里,为了研究隐性和显性反事实成分是否以及如何影响强化学习,我们使用了两个部分和完全反馈范例,其中选项与一些奖励分配相关联。我们的建模分析表明,使用选定选项的结果更新选定选项和未选定选项的值的模型可以更好地解释行为数据,该模型与纹状体中多巴胺的扩散功能相符。我们还观察到,这种偏见的大小取决于大脑中所涉及的系统,因此,在皮层下系统参与程度更高的转移阶段,这种影响较大;而在更需要皮层系统的审议价值估算阶段,这种影响较小。此外,我们的数据表明,情境效应不仅限于概率奖励,而且扩展为幅度奖励。这些结果表明,通过扩展反事实概念,
更新日期:2020-10-02
down
wechat
bug