当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-10-17 , DOI: 10.1007/s11263-021-01493-5
Rama Krishna Kandukuri 1 , Jan Achterhold 1 , Joerg Stueckler 1 , Michael Moeller 2
Affiliation  

Representation learning for video is increasingly gaining attention in the field of computer vision. For instance, video prediction models enable activity and scene forecasting or vision-based planning and control. In this article, we investigate the combination of differentiable physics and spatial transformers in a deep action conditional video representation network. By this combination our model learns a physically interpretable latent representation and can identify physical parameters. We propose supervised and self-supervised learning methods for our architecture. In experiments, we consider simulated scenarios with pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. We demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences. We evaluate the accuracy of our training methods, and demonstrate the ability of our method to predict future video frames from input images and actions.



中文翻译:

使用可微物理从视频中进行物理表征学习和参数识别

视频表示学习在计算机视觉领域越来越受到关注。例如,视频预测模型支持活动和场景预测或基于视觉的规划和控制。在本文中,我们研究了可微物理和空间变换器在深度动作条件视频表示网络中的组合。通过这种组合,我们的模型学习了物理可解释的潜在表示,并可以识别物理参数。我们为我们的架构提出了监督和自监督学习方法。在实验中,我们考虑了具有推动、滑动和碰撞物体的模拟场景,我们还分析了物理特性的可观察性。我们证明了我们的网络可以学习编码图像并从视频和动作序列中识别物理属性,如质量和摩擦力。我们评估了我们的训练方法的准确性,并展示了我们的方法从输入图像和动作预测未来视频帧的能力。

更新日期:2021-10-18
down
wechat
bug