State Representation Learning With Adjacent State Consistency Loss for Deep Reinforcement Learning,IEEE Multimedia

当前位置： X-MOL 学术 › IEEE Multimed. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

State Representation Learning With Adjacent State Consistency Loss for Deep Reinforcement Learning
IEEE Multimedia ( IF 3.2 ) Pub Date : 2021-01-26 , DOI: 10.1109/mmul.2021.3053774
Tianyu Zhao ₁ , Jian Zhao ₁ , Wengang Zhou ₁ , Yun Zhou ₁ , Houqiang Li ₁

Affiliation

Through well-designed optimization paradigm and deep neural networks as feature extractor, deep reinforcement learning (DRL) algorithms learn optimal policy on discrete and continuous action space. However, such capability is restricted by the low sampling efficiency. By inspecting the importance of feature extraction in DRL, we find that state feature learning is one of the key obstacles for sampling efficiently. To this end, we propose a new state representation learning scheme with adjacent state consistency loss (ASC loss). The loss is based on the hypothesis that the distance between adjacent states is smaller than that of far apart ones since scenes in videos generally evolve smoothly. We exploit ASC loss as an assistant of RL loss in the training phase to boost the state feature learning, and make evaluation on existing DRL algorithms as well as behavioral cloning algorithm. Experiments on Atari games and MuJoCo continuous control tasks demonstrate the effectiveness of our scheme.

中文翻译：

用于深度强化学习的具有相邻状态一致性损失的状态表示学习

通过精心设计的优化范式和作为特征提取器的深度神经网络，深度强化学习 (DRL) 算法学习离散和连续动作空间的最佳策略。然而，这种能力受到低采样效率的限制。通过检查特征提取在 DRL 中的重要性，我们发现状态特征学习是有效采样的关键障碍之一。为此，我们提出了一种具有相邻状态一致性损失（ASC 损失）的新状态表示学习方案。损失是基于这样的假设，即相邻状态之间的距离小于相距较远的状态之间的距离，因为视频中的场景通常是平滑演变的。我们在训练阶段利用 ASC 损失作为 RL 损失的助手来促进状态特征学习，并对现有的DRL算法和行为克隆算法进行评估。Atari 游戏和 MuJoCo 连续控制任务的实验证明了我们方案的有效性。

更新日期：2021-01-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>