当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Identify Physical Parameters from Video Using Differentiable Physics
arXiv - CS - Robotics Pub Date : 2020-09-17 , DOI: arxiv-2009.08292
Rama Krishna Kandukuri, Jan Achterhold, Michael M\"oller, J\"org St\"uckler

Video representation learning has recently attracted attention in computer vision due to its applications for activity and scene forecasting or vision-based planning and control. Video prediction models often learn a latent representation of video which is encoded from input frames and decoded back into images. Even when conditioned on actions, purely deep learning based architectures typically lack a physically interpretable latent space. In this study, we use a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation. We propose supervised and self-supervised learning methods to train our network and identify physical properties. The latter uses spatial transformers to decode physical states back into images. The simulation scenarios in our experiments comprise pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. In experiments we demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences in the simulated scenarios. We evaluate the accuracy of our supervised and self-supervised methods and compare it with a system identification baseline which directly learns from state trajectories. We also demonstrate the ability of our method to predict future video frames from input images and actions.

中文翻译:

学习使用可微物理从视频中识别物理参数

由于其在活动和场景预测或基于视觉的规划和控制方面的应用,视频表示学习最近在计算机视觉中引起了人们的关注。视频预测模型通常学习视频的潜在表示,该表示从输入帧编码并解码回图像。即使以动作为条件,纯粹基于深度学习的架构通常也缺乏物理上可解释的潜在空间。在这项研究中,我们在动作条件视频表示网络中使用可微物理引擎来学习物理潜在表示。我们提出了监督和自监督学习方法来训练我们的网络并识别物理特性。后者使用空间转换器将物理状态解码回图像。我们实验中的模拟场景包括推、滑动和碰撞物体,我们还分析了其物理特性的可观察性。在实验中,我们证明我们的网络可以学习编码图像并从模拟场景中的视频和动作序列中识别质量和摩擦等物理属性。我们评估监督和自监督方法的准确性,并将其与直接从状态轨迹学习的系统识别基线进行比较。我们还展示了我们的方法从输入图像和动作预测未来视频帧的能力。在实验中,我们证明我们的网络可以学习编码图像并从模拟场景中的视频和动作序列中识别质量和摩擦等物理属性。我们评估监督和自监督方法的准确性,并将其与直接从状态轨迹学习的系统识别基线进行比较。我们还展示了我们的方法从输入图像和动作预测未来视频帧的能力。在实验中,我们证明我们的网络可以学习编码图像并从模拟场景中的视频和动作序列中识别质量和摩擦等物理属性。我们评估监督和自监督方法的准确性,并将其与直接从状态轨迹学习的系统识别基线进行比较。我们还展示了我们的方法从输入图像和动作预测未来视频帧的能力。
更新日期:2020-09-18
down
wechat
bug