当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining STDP and binary networks for reinforcement learning from images and sparse rewards
Neural Networks ( IF 7.8 ) Pub Date : 2021-09-17 , DOI: 10.1016/j.neunet.2021.09.010
Sérgio F Chevtchenko 1 , Teresa B Ludermir 1
Affiliation  

Spiking neural networks (SNNs) aim to replicate energy efficiency, learning speed and temporal processing of biological brains. However, accuracy and learning speed of such networks is still behind reinforcement learning (RL) models based on traditional neural models. This work combines a pre-trained binary convolutional neural network with an SNN trained online through reward-modulated STDP in order to leverage advantages of both models. The spiking network is an extension of its previous version, with improvements in architecture and dynamics to address a more challenging task. We focus on extensive experimental evaluation of the proposed model with optimized state-of-the-art baselines, namely proximal policy optimization (PPO) and deep Q network (DQN). The models are compared on a grid-world environment with high dimensional observations, consisting of RGB images with up to 256 × 256 pixels. The experimental results show that the proposed architecture can be a competitive alternative to deep reinforcement learning (DRL) in the evaluated environment and provide a foundation for more complex future applications of spiking networks.



中文翻译:

结合 STDP 和二元网络从图像和稀疏奖励中进行强化学习

尖峰神经网络 (SNN) 旨在复制生物大脑的能源效率、学习速度和时间处理。然而,此类网络的准确性和学习速度仍落后于基于传统神经模型的强化学习 (RL) 模型。这项工作将预训练的二元卷积神经网络与通过奖励调制的 STDP 在线训练的 SNN 相结合,以利用两种模型的优势。尖峰网络是其先前版本的扩展,在架构和动态方面进行了改进,以应对更具挑战性的任务。我们专注于对所提出的模型进行广泛的实验评估,这些模型具有优化的最先进基线,即近端策略优化 (PPO) 和深度 Q 网络 (DQN)。这些模型在具有高维观察的网格世界环境中进行比较,由高达 256 × 256 像素的 RGB 图像组成。实验结果表明,所提出的架构可以成为评估环境中深度强化学习(DRL)的竞争替代品,并为尖峰网络的更复杂的未来应用奠定基础。

更新日期:2021-10-01
down
wechat
bug