当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Meta-Learning through Hebbian Plasticity in Random Networks
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-07-06 , DOI: arxiv-2007.02686
Elias Najarro and Sebastian Risi

Lifelong learning and adaptability are two defining aspects of biological agents. Modern reinforcement learning (RL) approaches have shown significant progress in solving complex tasks, however once training is concluded, the found solutions are typically static and incapable of adapting to new information or perturbations. While it is still not completely understood how biological brains learn and adapt so efficiently from experience, it is believed that synaptic plasticity plays a prominent role in this process. Inspired by this biological mechanism, we propose a search method that, instead of optimizing the weight parameters of neural networks directly, only searches for synapse-specific Hebbian learning rules that allow the network to continuously self-organize its weights during the lifetime of the agent. We demonstrate our approach on several reinforcement learning tasks with different sensory modalities and more than 450K trainable plasticity parameters. We find that starting from completely random weights, the discovered Hebbian rules enable an agent to navigate a dynamical 2D-pixel environment; likewise they allow a simulated 3D quadrupedal robot to learn how to walk while adapting to morphological damage not seen during training and in the absence of any explicit reward or error signal in less than 100 timesteps. Code is available at https://github.com/enajx/HebbianMetaLearning.

中文翻译:

通过随机网络中的 Hebbian 可塑性进行元学习

终身学习和适应性是生物制剂的两个定义方面。现代强化学习 (RL) 方法在解决复杂任务方面取得了重大进展,但是一旦训练结束,找到的解决方案通常是静态的,无法适应新信息或扰动。虽然还没有完全理解生物大脑如何从经验中如此有效地学习和适应,但相信突触可塑性在这个过程中起着重要作用。受这种生物学机制的启发,我们提出了一种搜索方法,该方法不是直接优化神经网络的权重参数,而是仅搜索特定于突触的 Hebbian 学习规则,该规则允许网络在代理的生命周期内不断自组织其权重. 我们在具有不同感官模式和超过 45 万个可训练可塑性参数的几个强化学习任务中展示了我们的方法。我们发现,从完全随机的权重开始,发现的 Hebbian 规则使代理能够在动态 2D 像素环境中导航;同样,它们允许模拟的 3D 四足机器人学习如何行走,同时适应训练期间未见的形态损伤,并且在不到 100 个时间步长的情况下没有任何明确的奖励或错误信号。代码可在 https://github.com/enajx/HebbianMetaLearning 获得。同样,它们允许模拟的 3D 四足机器人学习如何行走,同时适应训练期间未见的形态损伤,并且在不到 100 个时间步长的情况下没有任何明确的奖励或错误信号。代码可在 https://github.com/enajx/HebbianMetaLearning 获得。同样,它们允许模拟的 3D 四足机器人学习如何行走,同时适应训练期间未见的形态损伤,并且在不到 100 个时间步长的情况下没有任何明确的奖励或错误信号。代码可在 https://github.com/enajx/HebbianMetaLearning 获得。
更新日期:2020-10-26
down
wechat
bug