当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
arXiv - CS - Artificial Intelligence Pub Date : 2021-02-22 , DOI: arxiv-2102.11327 Guy Tennenholtz, Nir Baram, Shie Mannor
arXiv - CS - Artificial Intelligence Pub Date : 2021-02-22 , DOI: arxiv-2102.11327 Guy Tennenholtz, Nir Baram, Shie Mannor
Offline reinforcement learning approaches can generally be divided to
proximal and uncertainty-aware methods. In this work, we demonstrate the
benefit of combining the two in a latent variational model. We impose a latent
representation of states and actions and leverage its intrinsic Riemannian
geometry to measure distance of latent samples to the data. Our proposed
metrics measure both the quality of out of distribution samples as well as the
discrepancy of examples in the data. We integrate our metrics in a model-based
offline optimization framework, in which proximity and uncertainty can be
carefully controlled. We illustrate the geodesics on a simple grid-like
environment, depicting its natural inherent topology. Finally, we analyze our
approach and improve upon contemporary offline RL benchmarks.
中文翻译:
GELATO:用于离线强化学习的几何丰富的潜在模型
离线强化学习方法通常可分为近端方法和不确定性方法。在这项工作中,我们展示了在潜在的变异模型中结合两者的好处。我们强加了状态和动作的潜在表示,并利用其固有的黎曼几何来测量潜在样本到数据的距离。我们提出的指标既可以测量分布样本之外的质量,也可以测量数据中示例的差异。我们将指标集成到基于模型的离线优化框架中,在该框架中可以仔细控制邻近度和不确定性。我们在一个简单的网格状环境中说明了测地线,并描述了其自然的固有拓扑。最后,我们分析我们的方法并改进当代离线RL基准。
更新日期:2021-02-24
中文翻译:
GELATO:用于离线强化学习的几何丰富的潜在模型
离线强化学习方法通常可分为近端方法和不确定性方法。在这项工作中,我们展示了在潜在的变异模型中结合两者的好处。我们强加了状态和动作的潜在表示,并利用其固有的黎曼几何来测量潜在样本到数据的距离。我们提出的指标既可以测量分布样本之外的质量,也可以测量数据中示例的差异。我们将指标集成到基于模型的离线优化框架中,在该框架中可以仔细控制邻近度和不确定性。我们在一个简单的网格状环境中说明了测地线,并描述了其自然的固有拓扑。最后,我们分析我们的方法并改进当代离线RL基准。