当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Memristor Hardware-Friendly Reinforcement Learning
arXiv - CS - Machine Learning Pub Date : 2020-01-20 , DOI: arxiv-2001.06930
Nan Wu, Adrien Vincent, Dmitri Strukov, Yuan Xie

Recently, significant progress has been made in solving sophisticated problems among various domains by using reinforcement learning (RL), which allows machines or agents to learn from interactions with environments rather than explicit supervision. As the end of Moore's law seems to be imminent, emerging technologies that enable high performance neuromorphic hardware systems are attracting increasing attention. Namely, neuromorphic architectures that leverage memristors, the programmable and nonvolatile two-terminal devices, as synaptic weights in hardware neural networks, are candidates of choice to realize such highly energy-efficient and complex nervous systems. However, one of the challenges for memristive hardware with integrated learning capabilities is prohibitively large number of write cycles that might be required during learning process, and this situation is even exacerbated under RL situations. In this work we propose a memristive neuromorphic hardware implementation for the actor-critic algorithm in RL. By introducing a two-fold training procedure (i.e., ex-situ pre-training and in-situ re-training) and several training techniques, the number of weight updates can be significantly reduced and thus it will be suitable for efficient in-situ learning implementations. As a case study, we consider the task of balancing an inverted pendulum, a classical problem in both RL and control theory. We believe that this study shows the promise of using memristor-based hardware neural networks for handling complex tasks through in-situ reinforcement learning.

中文翻译:

忆阻器硬件友好的强化学习

最近,通过使用强化学习 (RL),在解决各个领域之间的复杂问题方面取得了重大进展,强化学习允许机器或代理从与环境的交互中学习,而不是从显式监督中学习。随着摩尔定律的终结似乎迫在眉睫,支持高性能神经形态硬件系统的新兴技术正引起越来越多的关注。也就是说,利用忆阻器、可编程和非易失性两端设备作为硬件神经网络中的突触权重的神经形态架构是实现这种高能效和复杂神经系统的候选者。然而,具有集成学习能力的忆阻硬件面临的挑战之一是学习过程中可能需要的大量写入周期,这种情况在 RL 情况下甚至会加剧。在这项工作中,我们为 RL 中的 actor-critic 算法提出了一种忆阻神经形态硬件实现。通过引入两重训练程序(即,ex-situ pre-training 和 in-situ re-training)和几种训练技术,可以显着减少权重更新的次数,因此将适用于高效的 in-situ学习实施。作为案例研究,我们考虑平衡倒立摆的任务,这是强化学习和控制理论中的经典问题。
更新日期:2020-01-22
down
wechat
bug