当前位置: X-MOL 学术Biol. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pitfalls in quantifying exploration in reward-based motor learning and how to avoid them
Biological Cybernetics ( IF 1.7 ) Pub Date : 2021-08-02 , DOI: 10.1007/s00422-021-00884-8
Nina M van Mastrigt 1 , Katinka van der Kooij 1 , Jeroen B J Smeets 1
Affiliation  

When learning a movement based on binary success information, one is more variable following failure than following success. Theoretically, the additional variability post-failure might reflect exploration of possibilities to obtain success. When average behavior is changing (as in learning), variability can be estimated from differences between subsequent movements. Can one estimate exploration reliably from such trial-to-trial changes when studying reward-based motor learning? To answer this question, we tried to reconstruct the exploration underlying learning as described by four existing reward-based motor learning models. We simulated learning for various learner and task characteristics. If we simply determined the additional change post-failure, estimates of exploration were sensitive to learner and task characteristics. We identified two pitfalls in quantifying exploration based on trial-to-trial changes. Firstly, performance-dependent feedback can cause correlated samples of motor noise and exploration on successful trials, which biases exploration estimates. Secondly, the trial relative to which trial-to-trial change is calculated may also contain exploration, which causes underestimation. As a solution, we developed the additional trial-to-trial change (ATTC) method. By moving the reference trial one trial back and subtracting trial-to-trial changes following specific sequences of trial outcomes, exploration can be estimated reliably for the three models that explore based on the outcome of only the previous trial. Since ATTC estimates are based on a selection of trial sequences, this method requires many trials. In conclusion, if exploration is a binary function of previous trial outcome, the ATTC method allows for a model-free quantification of exploration.



中文翻译:

在基于奖励的运动学习中量化探索的陷阱以及如何避免它们

当基于二元成功信息学习动作时,失败后的变化比成功后的变化更大。从理论上讲,失败后的额外可变性可能反映了对获得成功可能性的探索。当平均行为发生变化时(如在学习中),可以根据后续动作之间的差异来估计变异性。在研究基于奖励的运动学习时,可以从这种试验到试验的变化中可靠地估计探索吗?为了回答这个问题,我们尝试重建四种现有的基于奖励的运动学习模型所描述的探索基础学习。我们模拟了各种学习者和任务特征的学习。如果我们简单地确定失败后的额外变化,探索的估计对学习者和任务特征很敏感。我们确定了基于试验变化的量化探索的两个陷阱。首先,与性能相关的反馈会导致电机噪声和成功试验的探索相关样本,这会使探索估计产生偏差。其次,计算试验间变化的试验也可能包含探索,导致低估。作为解决方案,我们开发了额外的试验到试验变化 (ATTC) 方法。通过将参考试验移回一个试验并根据试验结果的特定序列减去试验间的变化,可以可靠地估计仅基于前一个试验结果探索的三个模型的探索。由于 ATTC 估计基于选择的试验序列,因此这种方法需要多次试验。综上所述,

更新日期:2021-08-02
down
wechat
bug