GRAC: Self-Guided and Self-Regularized Actor-Critic,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GRAC: Self-Guided and Self-Regularized Actor-Critic
arXiv - CS - Systems and Control Pub Date : 2020-09-18 , DOI: arxiv-2009.08973
Lin Shao, Yifan You, Mengyuan Yan, Qingyun Sun, Jeannette Bohg

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor critic algorithm. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.

中文翻译：

GRAC：自我指导和自我调节的演员-评论家

深度强化学习 (DRL) 算法已在一系列具有挑战性的决策和控制任务中得到成功证明。最近深度强化学习算法的一个主要组成部分是目标网络，它在学习 Q 函数时减轻了分歧。然而，由于延迟的功能更新，目标网络可能会减慢学习过程。我们在这项工作中的主要贡献是一种自正则化的 TD 学习方法，无需目标网络即可解决发散问题。此外，我们提出了一种自我引导的策略改进方法，将策略梯度与零阶优化相结合，以在广泛的邻域中搜索与更高 Q 值相关的动作。这使得学习对 Q 函数近似中的局部噪声更加鲁棒，并指导我们的演员网络的更新。总之，这些组件定义了 GRAC，一种新颖的自我引导和自我规范化的演员评论算法。我们在 OpenAI 健身房任务套件上评估 GRAC，在每个测试环境中达到或超过最先进的水平。

更新日期：2020-11-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>