Reinforcement learning with convolutional reservoir computing,Applied Intelligence

当前位置： X-MOL 学术 › Appl. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement learning with convolutional reservoir computing
Applied Intelligence ( IF 3.4 ) Pub Date : 2020-03-07 , DOI: 10.1007/s10489-020-01679-3
Hanten Chang , Katsuya Futagami

Recently, reinforcement learning models have achieved great success, mastering complex tasks such as Go and other games with higher scores than human players. Many of these models store considerable data on the tasks and achieve high performance by extracting visual and time-series features using convolutional neural networks (CNNs) and recurrent neural networks respectively. However, these networks have very high computational costs because they need to be trained by repeatedly using the stored data. In this study, we propose a novel practical approach called reinforcement learning with a convolutional reservoir computing (RCRC) model. The RCRC model uses a fixed random-weight CNN and a reservoir computing model to extract visual and time-series features. Using these extracted features, it decides actions with an evolution strategy method. Thereby, the RCRC model has several desirable features: (1) there is no need to train the feature extractor, (2) there is no need to store training data, (3) it can take a wide range of actions, and (4) there is only a single task-dependent weight matrix to be trained. Furthermore, we show the RCRC model can solve multiple reinforcement learning tasks with a completely identical feature extractor.

中文翻译：

卷积油藏计算的强化学习

最近，强化学习模型取得了巨大的成功，它可以完成诸如Go和其他比人类玩家得分更高的游戏等复杂任务。这些模型中的许多模型都存储了大量任务数据，并分别通过卷积神经网络（CNN）和递归神经网络提取视觉和时间序列特征来实现高性能。然而，这些网络具有非常高的计算成本，因为它们需要通过重复使用所存储的数据来进行训练。在这项研究中，我们提出了一种新颖的实用方法，即通过卷积储层计算（RCRC）模型进行的强化学习。RCRC模型使用固定的随机权重CNN和储层计算模型来提取视觉特征和时间序列特征。使用这些提取的特征，它可以使用进化策略方法来决定动作。因此，RCRC模型具有几个理想的功能：（1）无需训练特征提取器；（2）无需存储训练数据；（3）它可以采取广泛的行动；（4）），只有一个与任务相关的权重矩阵需要训练。此外，我们展示了RCRC模型可以使用完全相同的特征提取器解决多个强化学习任务。

更新日期：2020-03-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11