当前位置: X-MOL 学术Robotica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Experimental Study of Reinforcement Learning in Mobile Robots Through Spiking Architecture of Thalamo-Cortico-Thalamic Circuitry of Mammalian Brain
Robotica ( IF 2.7 ) Pub Date : 2019-11-18 , DOI: 10.1017/s0263574719001632
Vahid Azimirad , Mohammad Fattahi Sani

SUMMARYIn this paper, the behavioral learning of robots through spiking neural networks is studied in which the architecture of the network is based on the thalamo-cortico-thalamic circuitry of the mammalian brain. According to a variety of neurons, the Izhikevich model of single neuron is used for the representation of neuronal behaviors. One thousand and ninety spiking neurons are considered in the network. The spiking model of the proposed architecture is derived and prepared for the learning problem of robots. The reinforcement learning algorithm is based on spike-timing-dependent plasticity and dopamine release as a reward. It results in strengthening the synaptic weights of the neurons that are involved in the robot’s proper performance. Sensory and motor neurons are placed in the thalamus and cortical module, respectively. The inputs of thalamo-cortico-thalamic circuitry are the signals related to distance of the target from robot, and the outputs are the velocities of actuators. The target attraction task is used as an example to validate the proposed method in which dopamine is released when the robot catches the target. Some simulation studies, as well as experimental implementation, are done on a mobile robot named Tabrizbot. Experimental studies illustrate that after successful learning, the meantime of catching target is decreased by about 36%. These prove that through the proposed method, thalamo-cortical structure could be trained successfully to learn to perform various robotic tasks.

中文翻译:

通过哺乳动物大脑丘脑-皮质-丘脑回路的尖峰结构强化学习在移动机器人中的实验研究

摘要在本文中,通过脉冲神经网络研究机器人的行为学习,其中网络的架构基于哺乳动物大脑的丘脑-皮质-丘脑电路。根据神经元的种类,采用单神经元的Izhikevich模型来表示神经元的行为。网络中考虑了一千九十个尖峰神经元。所提出的架构的尖峰模型是为机器人的学习问题推导和准备的。强化学习算法基于尖峰时间依赖性可塑性和多巴胺释放作为奖励。它会增强与机器人正常性能相关的神经元的突触权重。感觉和运动神经元分别放置在丘脑和皮层模块中。丘脑-皮质-丘脑电路的输入是与目标与机器人的距离相关的信号,输出是执行器的速度。以目标吸引任务为例来验证所提出的方法,在该方法中,当机器人捕捉到目标时会释放多巴胺。一些模拟研究以及实验实现是在名为 Tabrizbot 的移动机器人上完成的。实验研究表明,学习成功后,捕捉目标的时间减少了约36%。这些证明通过所提出的方法,可以成功训练丘脑皮质结构以学习执行各种机器人任务。以目标吸引任务为例来验证所提出的方法,在该方法中,当机器人捕捉到目标时会释放多巴胺。一些模拟研究以及实验实现是在名为 Tabrizbot 的移动机器人上完成的。实验研究表明,学习成功后,捕捉目标的时间减少了约36%。这些证明通过所提出的方法,可以成功训练丘脑皮质结构以学习执行各种机器人任务。以目标吸引任务为例来验证所提出的方法,在该方法中,当机器人捕捉到目标时会释放多巴胺。一些模拟研究以及实验实现是在名为 Tabrizbot 的移动机器人上完成的。实验研究表明,学习成功后,捕捉目标的时间减少了约36%。这些证明通过所提出的方法,可以成功训练丘脑皮质结构以学习执行各种机器人任务。捕捉目标的时间减少了约36%。这些证明通过所提出的方法,可以成功训练丘脑皮质结构以学习执行各种机器人任务。捕捉目标的时间减少了约36%。这些证明通过所提出的方法,可以成功训练丘脑皮质结构以学习执行各种机器人任务。
更新日期:2019-11-18
down
wechat
bug