当前位置: X-MOL 学术IEEJ Trans. Electr. Electron. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Deep Reinforcement Learning Method Using Profit Sharing for Learning Acceleration
IEEJ Transactions on Electrical and Electronic Engineering ( IF 1 ) Pub Date : 2020-06-27 , DOI: 10.1002/tee.23180
Naoki Kodama 1 , Taku Harada 2 , Kazuteru Miyazaki 3
Affiliation  

Profit Sharing (PS), a reinforcement learning method that strongly reinforces successful experiences, has been shown to contribute to the improvement of learning speed when combined with a deep Q‐network (DQN). We expect a further improvement in learning speed by integrating PS‐based learning and Ape‐X DQN that has state‐of‐the‐art learning speed instead of the DQN. However, PS‐based learning does not use replay memory. In contrast, the Ape‐X DQN requires the use of replay memory because the exploration of the environment for collecting experiences and network training are performed asynchronously. In this study, we propose Learning‐accelerated Ape‐X, which integrates the Ape‐X DQN and PS‐based learning with some improvements including the use of replay memory. We show through numerical experiments that the proposed method improves the scores in Atari 2600 video games in a shorter time than the Ape‐X DQN. © 2020 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.

中文翻译:

利用利益共享的分布式深度强化学习方法,用于学习加速

利润共享(PS)是一种强化学习的方法,可以极大地增强成功经验,当与深度Q网络(DQN)结合使用时,可以提高学习速度。我们希望通过集成基于PS的学习和具有最新学习速度的Ape-X DQN而不是DQN来进一步提高学习速度。但是,基于PS的学习不使用重播内存。相比之下,Ape‐X DQN需要使用重播内存,因为探索环境以收集经验和网络培训是异步进行的。在这项研究中,我们提出了学习加速型Ape-X,它将Ape-X DQN和基于PS的学习集成在一起,并在某些方面进行了改进,包括使用了重播内存。我们通过数值实验表明,与Ape-X DQN相比,该方法可以在更短的时间内提高Atari 2600视频游戏的得分。©2020日本电气工程师学会。由Wiley Periodicals LLC发布。
更新日期:2020-06-27
down
wechat
bug