当前位置: X-MOL 学术arXiv.cs.SY › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GRAC: Self-Guided and Self-Regularized Actor-Critic
arXiv - CS - Systems and Control Pub Date : 2020-09-18 , DOI: arxiv-2009.08973
Lin Shao, Yifan You, Mengyuan Yan, Qingyun Sun, Jeannette Bohg

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor critic algorithm. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.

中文翻译:

GRAC:自我指导和自我调节的演员-评论家

深度强化学习 (DRL) 算法已在一系列具有挑战性的决策和控制任务中得到成功证明。最近深度强化学习算法的一个主要组成部分是目标网络,它在学习 Q 函数时减轻了分歧。然而,由于延迟的功能更新,目标网络可能会减慢学习过程。我们在这项工作中的主要贡献是一种自正则化的 TD 学习方法,无需目标网络即可解决发散问题。此外,我们提出了一种自我引导的策略改进方法,将策略梯度与零阶优化相结合,以在广泛的邻域中搜索与更高 Q 值相关的动作。这使得学习对 Q 函数近似中的局部噪声更加鲁棒,并指导我们的演员网络的更新。总之,这些组件定义了 GRAC,一种新颖的自我引导和自我规范化的演员评论算法。我们在 OpenAI 健身房任务套件上评估 GRAC,在每个测试环境中达到或超过最先进的水平。
更新日期:2020-11-12
down
wechat
bug