当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Batteries, camera, action! Learning a semantic control space for expressive robot cinematography
arXiv - CS - Robotics Pub Date : 2020-11-19 , DOI: arxiv-2011.10118
Rogerio Bonatti, Arthur Bucker, Sebastian Scherer, Mustafa Mukadam, Jessica Hodgins

Aerial vehicles are revolutionizing the way film-makers can capture shots of actors by composing novel aerial and dynamic viewpoints. However, despite great advancements in autonomous flight technology, generating expressive camera behaviors is still a challenge and requires non-technical users to edit a large number of unintuitive control parameters. In this work we develop a data-driven framework that enables editing of these complex camera positioning parameters in a semantic space (e.g. calm, enjoyable, establishing). First, we generate a database of video clips with a diverse range of shots in a photo-realistic simulator, and use hundreds of participants in a crowd-sourcing framework to obtain scores for a set of semantic descriptors for each clip. Next, we analyze correlations between descriptors and build a semantic control space based on cinematography guidelines and human perception studies. Finally, we learn a generative model that can map a set of desired semantic video descriptors into low-level camera trajectory parameters. We evaluate our system by demonstrating that our model successfully generates shots that are rated by participants as having the expected degrees of expression for each descriptor. We also show that our models generalize to different scenes in both simulation and real-world experiments. Supplementary video: https://youtu.be/6WX2yEUE9_k

中文翻译:

电池,相机,动作!学习表达性机器人摄影的语义控制空间

空中交通工具正在通过合成新颖的空中和动态视点,改变电影制片人捕捉演员镜头的方式。但是,尽管自主飞行技术取得了巨大进步,但生成富有表现力的摄像头行为仍然是一个挑战,并且需要非技术用户来编辑大量不直观的控制参数。在这项工作中,我们开发了一个数据驱动的框架,该框架使您可以在语义空间(例如平静,愉悦,建立)中编辑这些复杂的相机定位参数。首先,我们在逼真的仿真器中生成具有不同镜头范围的视频剪辑数据库,并在众包框架中使用数百名参与者来获取每个剪辑的一组语义描述符的得分。下一个,我们分析了描述符之间的相关性,并根据摄影指导和人类感知研究建立了语义控制空间。最后,我们学习了一个生成模型,该模型可以将一组所需的语义视频描述符映射到低级摄像机轨迹参数中。我们通过证明我们的模型成功生成了镜头,并根据参与者对每个描述符的预期表达程度进行了评估,对我们的系统进行了评估。我们还表明,我们的模型可以在仿真和实际实验中推广到不同的场景。补充视频:https://youtu.be/6WX2yEUE9_k 我们通过证明我们的模型成功生成了镜头,并根据参与者对每个描述符的预期表达程度进行了评估,对我们的系统进行了评估。我们还表明,我们的模型可以在仿真和实际实验中推广到不同的场景。补充视频:https://youtu.be/6WX2yEUE9_k 我们通过证明我们的模型成功生成了镜头,从而对参与者进行了评估,该镜头被参与者评估为具有每个描述符的预期表达程度。我们还表明,我们的模型可以在仿真和实际实验中推广到不同的场景。补充视频:https://youtu.be/6WX2yEUE9_k
更新日期:2020-11-23
down
wechat
bug