当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2020-01-13 , DOI: 10.1109/tnnls.2019.2959129
Dongming Wu , Xingping Dong , Jianbing Shen , Steven C. H. Hoi

The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics, has been largely left untouched. In this article, we investigate the underestimation phenomenon in the recent twin delay deep deterministic actor-critic algorithm and theoretically demonstrate its existence. We also observe that this underestimation bias does indeed hurt performance in various experiments. Considering the opposite properties of single-critic and double-critic methods, we propose a novel triplet-average deep deterministic policy gradient algorithm that takes the weighted action value of three target critics to reduce the estimation bias. Given the connection between estimation bias and approximation error, we suggest averaging previous target values to reduce per-update error and further improve performance. Extensive empirical results over various continuous control tasks in OpenAI gym show that our approach outperforms the state-of-the-art methods.

中文翻译:

通过三重平均深度确定性策略梯度减少估计偏差。

由函数逼近引起的高估是Q学习算法中的一个众所周知的属性,尤其是在单批评模型中,这导致实际任务中的性能较差。然而,相反的属性,低估,通常在具有双重批评家的Q学习方法中经常发生,却没有被改变。在本文中,我们研究了最近的双延迟深度确定性参与者评论算法中的低估现象,并从理论上证明了它的存在。我们还观察到,这种低估偏差确实会损害各种实验的性能。考虑到单批评和双批评方法的相反性质,我们提出了一种新颖的三重态平均深度确定性策略梯度算法,该算法采用三个目标评论者的加权行动值来减少估计偏差。考虑到估计偏差和近似误差之间的联系,我们建议对先前的目标值进行平均,以减少每次更新误差并进一步提高性能。在OpenAI体育馆中,各种连续控制任务的广泛经验结果表明,我们的方法优于最新方法。
更新日期:2020-01-13
down
wechat
bug