Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning
arXiv - CS - Robotics Pub Date : 2021-08-02 , DOI: arxiv-2108.01069
Jinghuan Shang, Michael S. Ryoo

Humans learn to imitate by observing others. However, robot imitation learning generally requires expert demonstrations in the first-person view (FPV). Collecting such FPV videos for every robot could be very expensive. Third-person imitation learning (TPIL) is the concept of learning action policies by observing other agents in a third-person view (TPV), similar to what humans do. This ultimately allows utilizing human and robot demonstration videos in TPV from many different data sources, for the policy learning. In this paper, we present a TPIL approach for robot tasks with egomotion. Although many robot tasks with ground/aerial mobility often involve actions with camera egomotion, study on TPIL for such tasks has been limited. Here, FPV and TPV observations are visually very different; FPV shows egomotion while the agent appearance is only observable in TPV. To enable better state learning for TPIL, we propose our disentangled representation learning method. We use a dual auto-encoder structure plus representation permutation loss and time-contrastive loss to ensure the state and viewpoint representations are well disentangled. Our experiments show the effectiveness of our approach.

中文翻译：

用于第三人称模仿学习的自监督解开表示学习

人类通过观察他人来学习模仿。然而，机器人模仿学习一般需要在第一人称视角（FPV）下进行专家演示。为每个机器人收集这样的 FPV 视频可能非常昂贵。第三人称模仿学习 (TPIL) 是通过以第三人称视角 (TPV) 观察其他代理来学习行动策略的概念，类似于人类所做的。这最终允许利用来自许多不同数据源的 TPV 中的人类和机器人演示视频进行策略学习。在本文中，我们提出了一种用于具有自我运动的机器人任务的 TPIL 方法。尽管许多具有地面/空中机动性的机器人任务通常涉及带有相机自我运动的动作，但针对此类任务的 TPIL 研究一直很有限。在这里，FPV 和 TPV 观察在视觉上非常不同；FPV 显示自我运动，而代理外观仅在 TPV 中可见。为了使 TPIL 能够更好地学习状态，我们提出了我们的分离表示学习方法。我们使用双自动编码器结构加上表示置换损失和时间对比损失来确保状态和视点表示很好地解开。我们的实验证明了我们方法的有效性。

更新日期：2021-08-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文