当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2020-08-31 , DOI: 10.1109/tnnls.2020.3008249
Linsen Dong , Yuanlong Li , Xin Zhou , Yonggang Wen , Kyle Guan

Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model to generate synthetic data for policy training purpose. The MBRL framework, nevertheless, is inherently limited by the convoluted process of jointly optimizing control policy, learning system dynamics, and sampling data from two sources controlled by complicated hyperparameters. As such, the training process involves overwhelmingly manual tuning and is prohibitively costly. In this research, we propose a ``reinforcement on reinforcement'' (RoR) architecture to decompose the convoluted tasks into two decoupled layers of RL. The inner layer is the canonical MBRL training process which is formulated as a Markov decision process, called training process environment (TPE). The outer layer serves as an RL agent, called intelligent trainer, to learn an optimal hyperparameter configuration for the inner TPE. This decomposition approach provides much-needed flexibility to implement different trainer designs, referred to ``train the trainer.'' In our research, we propose and optimize two alternative trainer designs: 1) an unihead trainer and 2) a multihead trainer. Our proposed RoR framework is evaluated for five tasks in the OpenAI gym. Compared with three other baseline methods, our proposed intelligent trainer methods have a competitive performance in autotuning capability, with up to 56% expected sampling cost saving without knowing the best parameter configurations in advance. The proposed trainer framework can be easily extended to tasks that require costly hyperparameter tuning.

中文翻译:

基于Dyna风格模型的深度强化学习的智能培训师。

基于模型的强化学习(MBRL)已被提出作为一种有前途的替代解决方案,可通过利用系统动力学模型来生成用于政策培训目的的综合数据来解决规范RL中的高采样成本挑战。但是,MBRL框架固有地受到联合优化控制策略,学习系统动力学以及从两个由复杂超参数控制的来源采样数据的复杂过程的限制。因此,培训过程涉及大量的手动调整,而且成本高昂。在这项研究中,我们提出了一种``加强加固''(RoR)体系结构,以将卷积的任务分解为两个分离的RL层。内层是规范的MBRL训练过程,被制定为马尔可夫决策过程,称为培训过程环境(TPE)。外层充当RL代理,称为智能培训师,以了解内部TPE的最佳超参数配置。这种分解方法为实施不同的训练器设计(称为``训练训练器'')提供了急需的灵活性。在我们的研究中,我们提出并优化了两种替代训练器设计:1)单头训练器和2)多头训练器。我们提出的RoR框架在OpenAI体育馆中针对五项任务进行了评估。与其他三种基准方法相比,我们提出的智能训练器方法在自动调整功能方面具有竞争优势,无需事先了解最佳参数配置即可节省多达56%的预期采样成本。
更新日期:2020-08-31
down
wechat
bug