当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MDPs with Unawareness in Robotics
arXiv - CS - Robotics Pub Date : 2020-05-20 , DOI: arxiv-2005.10381
Nan Rong, Joseph Y. Halpern, Ashutosh Saxena

We formalize decision-making problems in robotics and automated control using continuous MDPs and actions that take place over continuous time intervals. We then approximate the continuous MDP using finer and finer discretizations. Doing this results in a family of systems, each of which has an extremely large action space, although only a few actions are "interesting". We can view the decision maker as being unaware of which actions are "interesting". We can model this using MDPUs, MDPs with unawareness, where the action space is much smaller. As we show, MDPUs can be used as a general framework for learning tasks in robotic problems. We prove results on the difficulty of learning a near-optimal policy in an an MDPU for a continuous task. We apply these ideas to the problem of having a humanoid robot learn on its own how to walk.

中文翻译:

对机器人技术无意识的 MDP

我们使用连续的 MDP 和在连续时间间隔内发生的动作将机器人和自动化控制中的决策问题形式化。然后我们使用越来越精细的离散化来近似连续 MDP。这样做会产生一系列系统,每个系统都有非常大的动作空间,尽管只有少数动作是“有趣的”。我们可以将决策者视为不知道哪些操作是“有趣的”。我们可以使用 MDPUs 进行建模,MDPs 没有意识,其中动作空间要小得多。正如我们所展示的,MDPU 可以用作机器人问题中学习任务的通用框架。我们证明了在 MDPU 中为连续任务学习接近最优策略的难度的结果。
更新日期:2020-05-22
down
wechat
bug