当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Memory-based Deep Reinforcement Learning for POMDP
arXiv - CS - Robotics Pub Date : 2021-02-24 , DOI: arxiv-2102.12344
Lingheng Meng, Rob Gorbet, Dana Kulić

A promising characteristic of Deep Reinforcement Learning (DRL) is its capability to learn optimal policy in an end-to-end manner without relying on feature engineering. However, most approaches assume a fully observable state space, i.e. fully observable Markov Decision Process (MDP). In real-world robotics, this assumption is unpractical, because of the sensor issues such as sensors' capacity limitation and sensor noise, and the lack of knowledge about if the observation design is complete or not. These scenarios lead to Partially Observable MDP (POMDP) and need special treatment. In this paper, we propose Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) by introducing a memory component to TD3, and compare its performance with other DRL algorithms in both MDPs and POMDPs. Our results demonstrate the significant advantages of the memory component in addressing POMDPs, including the ability to handle missing and noisy observation data.

中文翻译:

基于内存的POMDP深度强化学习

深度强化学习(DRL)的一个有前途的特征是它能够以端对端的方式学习最佳策略的能力,而无需依赖功能工程。但是,大多数方法都假设一个完全可观察的状态空间,即完全可观察的马尔可夫决策过程(MDP)。在现实世界的机器人技术中,由于传感器问题(例如传感器的容量限制和传感器噪声)以及缺乏有关观测设计是否完整的知识,这种假设是不切实际的。这些情况导致了部分可观察的MDP(POMDP),需要特殊对待。在本文中,我们通过向TD3引入内存组件,提出了基于长期内存的双延迟深度确定性策略梯度(LSTM-TD3),并将其性能与MDP和POMDP中的其他DRL算法进行了比较。
更新日期:2021-02-25
down
wechat
bug