当前位置: X-MOL 学术J. Neurosci. Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning.
Journal of Neuroscience Methods ( IF 2.7 ) Pub Date : 2020-05-15 , DOI: 10.1016/j.jneumeth.2020.108777
Eshaan S Iyer 1 , Megan A Kairiss 2 , Adrian Liu 3 , A Ross Otto 2 , Rosemary C Bagot 4
Affiliation  

BACKGROUND Reinforcement learning (RL) and win stay/lose shift model accounts of decision making are both widely used to describe how individuals learn about and interact with rewarding environments. Though mutually informative, these accounts are often conceptualized as independent processes and so the potential relationships between win stay/lose shift tendencies and RL parameters have not been explored. NEW METHOD We introduce a methodology to directly relate RL parameters to behavioral strategy. Specifically, by calculating a truncated multivariate normal distribution of RL parameters given win stay/lose shift tendencies from simulating these tendencies across the parameter space, we maximize the normal distribution for a given set of win stay/lose shift tendencies to approximate reinforcement learning parameters. RESULTS We demonstrate novel relationships between win stay/lose shift tendencies and RL parameters that challenge conventional interpretations of lose shift as a metric of loss sensitivity. Further, we demonstrate in both simulated and empirical data that this method of parameter approximation yields reliable parameter recovery. COMPARISON WITH EXISTING METHOD We compare this method against the conventionally used maximum likelihood estimation method for parameter approximation in simulated noisy and empirical data. For simulated noisy data, we show that this method performs similarly to maximum likelihood estimation. For empirical data, however, this method provides a more reliable approximation of reinforcement learning parameters than maximum likelihood estimation. CONCLUSIONS We demonstrate the existence of relationships between win stay/lose shift tendencies and RL parameters and introduce a method that leverages these relationships to enable recovery of RL parameters exclusively from win stay/lose shift tendencies.

中文翻译:

探索强化学习和简单的行为策略之间的关系,以了解概率性奖励学习。

背景技术强化学习(RL)和决策的获胜留/输班模型模型都广泛用于描述个人如何学习奖励环境并与之互动。尽管相互提供信息,但这些帐户通常被概念化为独立的过程,因此尚未探讨获胜停留/输球趋势与RL参数之间的潜在关系。新方法我们介绍一种将RL参数直接与行为策略相关的方法。具体而言,通过模拟整个参数空间中的这些趋势,通过计算给定获胜停留/失落倾向的RL参数的截断多元正态分布,我们可以将给定获胜停留/失落倾向的一组正态分布最大化,以近似强化学习参数。结果我们证明了获胜停留/失误转移趋势与RL参数之间的新颖关系,这些关系挑战了作为损失敏感性指标的传统损失转移解释。此外,我们在模拟和经验数据中均表明,这种参数逼近方法可产生可靠的参数恢复。与现有方法的比较我们将这种方法与用于模拟噪声和经验数据中参数逼近的常规最大似然估计方法进行比较。对于模拟的噪声数据,我们表明该方法的执行与最大似然估计相似。但是,对于经验数据,此方法提供的强化学习参数逼近度比最大似然估计值更可靠。
更新日期:2020-05-15
down
wechat
bug