当前位置: X-MOL 学术Front. Syst. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Neural Networks With Motivation
Frontiers in Systems Neuroscience ( IF 3 ) Pub Date : 2021-01-11 , DOI: 10.3389/fnsys.2020.609316
Sergey A Shuvaev 1 , Ngoc B Tran 1 , Marcus Stephenson-Jones 1, 2 , Bo Li 1 , Alexei A Koulakov 1
Affiliation  

Animals rely on internal motivational states to make decisions. The role of motivational salience in decision making is in early stages of mathematical understanding. Here, we propose a reinforcement learning framework that relies on neural networks to learn optimal ongoing behavior for dynamically changing motivation values. First, we show that neural networks implementing Q-learning with motivational salience can navigate in environment with dynamic rewards without adjustments in synaptic strengths when the needs of an agent shift. In this setting, our networks may display elements of addictive behaviors. Second, we use a similar framework in hierarchical manager-agent system to implement a reinforcement learning algorithm with motivation that both infers motivational states and behaves. Finally, we show that, when trained in the Pavlovian conditioning setting, the responses of the neurons in our model resemble previously published neuronal recordings in the ventral pallidum, a basal ganglia structure involved in motivated behaviors. We conclude that motivation allows Q-learning networks to quickly adapt their behavior to conditions when expected reward is modulated by agent’s dynamic needs. Our approach addresses the algorithmic rationale of motivation and makes a step toward better interpretability of behavioral data via inference of motivational dynamics in the brain.

中文翻译:

具有动机的神经网络

动物依靠内部动机状态来做出决定。动机显着性在决策中的作用处于数学理解的早期阶段。在这里,我们提出了一个强化学习框架,它依赖于神经网络来学习动态变化的动机值的最佳持续行为。首先,我们表明,实现具有动机显着性的 Q 学习的神经网络可以在具有动态奖励的环境中导航,而无需在代理需求发生变化时调整突触强度。在这种情况下,我们的网络可能会表现出成瘾行为的元素。其次,我们在分层管理器-代理系统中使用类似的框架来实现具有动机的强化学习算法,该算法可以推断动机状态和行为。最后,我们表明,当在巴甫洛夫条件设置中进行训练时,我们模型中神经元的反应类似于先前发表的腹侧苍白球中的神经元记录,腹侧苍白球是一种参与动机行为的基底神经节结构。我们得出的结论是,当预期奖励受到智能体动态需求的调节时,动机允许 Q 学习网络快速调整其行为以适应条件。我们的方法解决了动机的算法原理,并通过推断大脑中的动机动态,朝着更好地解释行为数据迈出了一步。
更新日期:2021-01-11
down
wechat
bug