Gradient boosting in crowd ensembles for Q-learning using weight sharing,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Gradient boosting in crowd ensembles for Q-learning using weight sharing
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-03-23 , DOI: 10.1007/s13042-020-01115-5
D. L. Elliott , K. C. Santosh , Charles Anderson

Reinforcement learning (RL) is a double-edged sword: it frees the human trainer from having to provide voluminous supervised training data or from even knowing a solution. On the other hand, a common complaint about RL is that learning is slow. Deep Q-learning (DQN), a somewhat recent development, has allowed practitioners and scientists to solve tasks previously thought unsolvable by a reinforcement learning approach. However DQN has resulted in an explosion in the number of model parameters which has further exasperated the computational needs of Q-learning during training. In this work, an ensemble approach which improves the training time, in terms of the number of interactions with the training environment, is proposed. In the presented experiments, it is shown that the proposed approach improves stability of during training, results in improved average performance, results in more reliable training, and faster learning of features in convolutional layers.

中文翻译：

使用权重共享在人群合奏中进行梯度学习以进行Q学习

强化学习（RL）是一把双刃剑：它使培训人员无需提供大量的有监督的培训数据，甚至无需知道解决方案。另一方面，关于RL的一个普遍抱怨是学习缓慢。深度Q学习（DQN）是一种较新的技术，它使从业者和科学家能够解决以前认为通过强化学习方法无法解决的任务。然而，DQN导致模型参数数量激增，这进一步激怒了训练期间Q学习的计算需求。在这项工作中，提出了一种与训练环境交互的次数更多的训练方法，该方法可以缩短训练时间。在提出的实验中，表明所提出的方法提高了训练过程中的稳定性，

更新日期：2020-03-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11