Greedy-Step Off-Policy Reinforcement Learning

Wang, Yuhui; Wu, Qingyuan; He, Pengcheng; Tan, Xiaoyang

Computer Science > Machine Learning

arXiv:2102.11717 (cs)

[Submitted on 23 Feb 2021 (v1), last revised 15 Dec 2021 (this version, v4)]

Title:Greedy-Step Off-Policy Reinforcement Learning

Authors:Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan

View PDF

Abstract:Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI). However, multi-step bootstrapping is often at cross-purposes with and off-policy learning in PI-based methods due to the large variance of multi-step off-policy correction. In contrast, VI-based methods are naturally off-policy but subject to one-step this http URL this paper, we deduce a novel multi-step Bellman Optimality Equation by utilizing a latent structure of multi-step bootstrapping with the optimal value function. Via this new equation, we derive a new multi-step value iteration method that converges to the optimal value function with exponential contraction rate $\mathcal{O}(\gamma^n)$ but only linear computational complexity. Moreover, it can naturally derive a suite of multi-step off-policy algorithms that can safely utilize data collected by arbitrary policies without correction.Experiments reveal that the proposed methods are reliable, easy to implement and achieve state-of-the-art performance on a series of standard benchmark datasets.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as:	arXiv:2102.11717 [cs.LG]
	(or arXiv:2102.11717v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.11717

Submission history

From: Yuhui Wang [view email]
[v1] Tue, 23 Feb 2021 14:32:20 UTC (12,920 KB)
[v2] Sun, 7 Mar 2021 15:06:27 UTC (12,919 KB)
[v3] Tue, 8 Jun 2021 09:45:00 UTC (2,182 KB)
[v4] Wed, 15 Dec 2021 16:17:52 UTC (11,604 KB)

Computer Science > Machine Learning

Title:Greedy-Step Off-Policy Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Greedy-Step Off-Policy Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators