Coherent video generation for multiple hand-held cameras with dynamic foreground

Zhang, Fang-Lue; Barnes, Connelly; Zhang, Hao-Tian; Zhao, Junhong; Salas, Gabriel

doi:10.1007/s41095-020-0187-3

Coherent video generation for multiple hand-held cameras with dynamic foreground

Research Article
Open access
Published: 03 September 2020

Volume 6, pages 291–306, (2020)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Coherent video generation for multiple hand-held cameras with dynamic foreground

Download PDF

Fang-Lue Zhang¹,
Connelly Barnes²,
Hao-Tian Zhang³,
Junhong Zhao¹ &
…
Gabriel Salas¹

591 Accesses
3 Citations
Explore all metrics

Abstract

For many social events such as public performances, multiple hand-held cameras may capture the same event. This footage is often collected by amateur cinematographers who typically have little control over the scene and may not pay close attention to the camera. For these reasons, each individually captured video may fail to cover the whole time of the event, or may lose track of interesting foreground content such as a performer. We introduce a new algorithm that can synthesize a single smooth video sequence of moving foreground objects captured by multiple hand-held cameras. This allows later viewers to gain a cohesive narrative experience that can transition between different cameras, even though the input footage may be less than ideal. We first introduce a graph-based method for selecting a good transition route. This allows us to automatically select good cut points for the hand-held videos, so that smooth transitions can be created between the resulting video shots. We also propose a method to synthesize a smooth photorealistic transition video between each pair of hand-held cameras, which preserves dynamic foreground content during this transition. Our experiments demonstrate that our method outperforms previous state-of-the-art methods, which struggle to preserve dynamic foreground content.

Article PDF

Efficient Video Cutout by Paint Selection

Article 01 May 2015

Yun Zhang, Yan-Long Tang & Ke-Li Cheng

Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes

E-GrabCut: an economic method of iterative video object extraction

Article 29 April 2017

Le Dong, Ning Feng, … Jingjing Wang

References

Guo, H.; Liu, S. C.; He, T.; Zhu, S. Y.; Zeng, B.; Gabbouj, M. Joint video stitching and stabilization from moving cameras. IEEE Transactions on Image Processing Vol. 25, No. 11, 5491–5503, 2016.
Article MathSciNet Google Scholar
Lin, K. M.; Liu, S. C.; Cheong, L. F.; Zeng, B. Seamless video stitching from hand-held camera inputs. Computer Graphics Forum Vol. 35, No. 2, 479–487, 2016.
Article Google Scholar
Nie, Y. W.; Su, T.; Zhang, Z. S.; Sun, H. Q.; Li, G. Q. Dynamic video stitching via shakiness removing. IEEE Transactions on Image Processing Vol. 27, No. 1, 164–178, 2018.
Article MathSciNet Google Scholar
Arev, I.; Park, H. S.; Sheikh, Y.; Hodgins, J.; Shamir, A. Automatic editing of footage from multiple social cameras. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 81, 2014.
Carranza, J.; Theobalt, C.; Magnor, M. A.; Seidel, H.-P. Free-viewpoint video of human actors. ACM Transactions on Graphics Vol. 22, No. 3, 569–577, 2003.
Article Google Scholar
Collet, A.; Chuang, M.; Sweeney, P.; Gillett, D.; Evseev, D.; Calabrese, D.; Hoppe, H.; Kirk, A.; Sullivan, S. High-quality streamable free-viewpoint video. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 69, 2015.
Szeliski, R.; Shum, H.-Y. Creating full view panoramic image mosaics and environment maps. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, 251–258, 1997.
ElSaban, M. A.; Refaat, M.; Kaheel, A.; AbdulHamid, A. Stitching videos streamed by mobile phones in realtime. In: Proceedings of the 17th ACM International Conference on Multimedia, 1009–1010, 2009.
Lin, W.-Y.; Liu, S.; Matsushita, Y.; Ng, T.-T.; Cheong, F. L. Smoothly varying affine stitching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 345–352, 2011.
Zaragoza, J.; Chin, T. J.; Tran, Q. H.; Brown, M. S.; Suter, D. As-projective-as-possible image stitching with moving DLT. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, No. 7, 1285–1298, 2014.
Article Google Scholar
Ma, T. Z.; Nie, Y. W.; Zhang, Q.; Zhang, Z. S.; Sun, H. Q.; Li, G. Q. Effective video stabilization via joint trajectory smoothing and frame warping. IEEE Transactions on Visualization and Computer Graphics doi: https://doi.org/10.1109/TVCG.2019.2923196, 2019.
Liu, F.; Gleicher, M.; Jin, H. L.; Agarwala, A. Content-preserving warps for 3D video stabilization. In: Proceedings of the ACM SIGGRAPH 2009 papers, Article No. 44, 2009.
Zhang, F.-L.; Wu, X; Zhang, H.-T.; Wang, J.; Hu, S.-M. Robust background identification for dynamic video editing. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 197, 2016.
Kwatra, V.; Schedl, A.; Essa, I.; Turk, G.; Bobick, A. Graphcut textures: Image and video synthesis using graph cuts. ACM Transactions on Graphics Vol. 22, No. 3, 277–286, 2003.
Article Google Scholar
Agarwala, A.; Zheng, K. C.; Pal, C.; Agrawala, M.; Cohen, M.; Curless, B.; Salesin, D.; Szeliski, R. Panoramic video textures. ACM Transactions on Graphics Vol. 24, No. 3, 821–827, 2005.
Article Google Scholar
Anderson, R.; Gallup, D.; Barron, J. T.; Kontkanen, J.; Snavely, N.; Hernández, C.; Agarwal, S.; Seitz, S. M. Jump: virtual reality video. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 198, 2016.
Silva, R. M. A.; Feijó, B.; Gomes, P. B.; Frensh, T.; Monteiro, D. Real time 360° video stitching and streaming. In: Proceedings of the ACM SIGGRAPH 2016 Posters, Article No. 70, 2016.
Guo, H.; Liu, S. C.; Zhu, S. Y.; Shen, H. T.; Zeng, B. View-consistent MeshFlow for stereoscopic video stabilization. IEEE Transactions on Computational Imaging Vol. 4, No. 4, 573–584, 2018.
Article Google Scholar
Wei X.; Chai J. Videomocap: Modeling physically realistic human motion from monocular video sequences. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 42, 2010.
Ballan, L.; Brostow, G. J.; Puwein, J.; Pollefeys, M. Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 87, 2010.
Tompkin, J.; Kim, K. I.; Kautz, J.; Theobalt, C Videoscapes: exploring sparse, unstructured video collections. ACM Transactions on Graphics Vol. 31, No. 4, Article No. 68, 2012.
Wang, M.; Lyu, X. Q.; Li, Y. J.; Zhang, F. L. VR content creation and exploration with deep learning: A survey. Computational Visual Media Vol. 6, No. 1, 3–28, 2020.
Article Google Scholar
Zhu, Z.; Lu, J. M.; Wang, M. X.; Zhang, S. H.; Martin, R. R.; Liu, H. T.; Hu, S.-M. A comparative study of algorithms for realtime panoramic video blending. IEEE Transactions on Image Processing Vol. 27, No. 6, 2952–2965, 2018.
Article MathSciNet Google Scholar
Lee, W.; Chen, H.; Chen, M.; Shen, I.; Chen, B. Y. High-resolution 360 video foveated stitching for realtime VR. Computer Graphics Forum Vol. 36, No. 7, 115–123, 2017.
Article Google Scholar
Liu, Q. X.; Su, X. Y.; Zhang, L.; Huang, H. Panoramic video stitching of dual cameras based on spatio-temporal seam optimization. Multimedia Tools and Applications Vol. 79, 3107–3124, 2020.
Article Google Scholar
Perazzi, F.; Sorkine-Hornung, A.; Zimmer, H.; Kaufmann, P.; Wang, O.; Watson, S.; Gross, M. Panoramic video from unstructured camera arrays. Computer Graphics Forum Vol. 34, No. 2, 57–68, 2015.
Article Google Scholar
Wang, O.; Schroers, C.; Zimmer, H.; Gross, M.; Sorkine-Hornung, A. VideoSnapping: Interactive synchronization of multiple videos. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 77, 2014.
Cui, Z. P.; Wang, O.; Tan, P.; Wang, J. Time slice video synthesis by robust video alignment. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 131, 2017.
Barnes, C.; Goldman, D. B.; Shechtman, E.; Finkelstein, A. Video tapestries with continuous temporal zoom. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 89, 2010.
Zhang, Z. S.; Nie, Y. W.; Sun, H. Q.; Lai, Q. X.; Li, G. Q. Multi-video object synopsis integrating optimal view switching. In: Proceedings of the SIGGRAPH Asia 2017 Technical Briefs Article No. 17, 2017.
Wang, M.; Shamir, A.; Yang, G. Y.; Lin, J. K.; Yang, G. W.; Lu, S. P.; Hu, S.-M. BiggerSelfie: Selfie video expansion with hand-held camera. IEEE Transactions on Image Processing Vol. 27, No. 12, 5854–5865, 2018.
Article MathSciNet Google Scholar
Wu, C. VisualSFM: A visual structure from motion system. 2011. Available at http://ccwu.me/vsfm.
Wu, C.; Agarwal, S.; Curless, B.; Seitz, S. M. Multicore bundle adjustment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3057–3064 2011.
Lee, S. M.; Xin, J. H.; Westland, S. Evaluation of image similarity by histogram intersection. Color Research & Application Vol. 30, No. 4, 265–274, 2005.
Article Google Scholar
Newson, A.; Almansa, A.; Fradet, M.; Gousseau, Y.; Pérez, P. Video inpainting of complex scenes. SIAM Journal on Imaging Sciences Vol. 7, No. 4, 1993–2019, 2014.
Article MathSciNet Google Scholar
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Article Google Scholar
He, K. M.; Sun, J.; Tang, X. O. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 6, 1397–1409, 2013.
Article Google Scholar
Wu, X.; Fang, X. N.; Chen, T.; Zhang, F. L. JMNet: A joint matting network for automatic human matting. Computational Visual Media Vol. 6, No. 2, 215–224, 2020.
Article Google Scholar
Boykov, Y.; Veksler, O.; Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 11, 1222–1239, 2001.
Article Google Scholar
Zhang, Y.; Lai, Y.-K.; Zhang, F.-L. Content-preserving image stitching with regular boundary constraints. arXiv preprint arXiv:1810.11220, 2018.
Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 24, No. 4, 509–522, 2002.
Article Google Scholar
Cheng, M. M.; Zhang, F. L.; Mitra, N. J.; Huang, X. L.; Hu, S. M. RepFinder: Finding approximately repeated scene elements for image editing. In: Proceedings of the ACM SIGGRAPH 2010 papers Article No. 83, 2010.
Schönberger, J. L.; Frahm, J.-M. Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 4104–4113, 2016.
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Article Google Scholar

Download references

Acknowledgements

This work was supported by a Research Establishment Grant of Victoria University of Wellington (Project No. 8-1620-216786-3744) and a Victoria Research Excellence Award.

Author information

Authors and Affiliations

Victoria University of Wellington, Wellington, 6012, New Zealand
Fang-Lue Zhang, Junhong Zhao & Gabriel Salas
Adobe Research, Seattle, USA
Connelly Barnes
Stanford University, San Francisco, USA
Hao-Tian Zhang

Authors

Fang-Lue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Connelly Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Hao-Tian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Junhong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Salas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang-Lue Zhang.

Additional information

Fang-Lue Zhang is currently a lecturer with Victoria University of Wellington, New Zealand. He received his bachelor degree from Zhejiang University, Hangzhou, China, in 2009, and his doctoral degree from Tsinghua University, Beijing, China, in 2015. His research interests include image and video editing, computer vision, and computer graphics. He is a member of IEEE and ACM. He received Victoria Early-Career Research Excellence Award in 2019.

Connelly Barnes is a senior researcher at Adobe Research. Previously, he was an assistant professor at the University of Virginia. He received his Ph.D. degree from Princeton University in 2011. He develops techniques for efficiently manipulating visual data in computer graphics by using semantic information from computer vision, with applications in computational photography, image editing, art, and hiding visual information. Many computer graphics algorithms are more useful if they are interactive; therefore, he also focuses on efficiency and optimization, including some compiler technologies.

Hao-Tian Zhang is currently a Ph.D. student at Stanford University. He received his B.S. degree from Tsinghua University in 2017. His research interests include image and video editing, and physically-based simulation.

Junhong Zhao is a postdoctoral research fellow of the Computational Media Innovation Centre (CMIC) at Victoria University of Wellington. She completed her doctoral degree in 2015 at the Institute of Electronics, Chinese Academy of Sciences. She worked in the Human-Computer Speech Interaction Lab of Tsinghua University, on computer-assisted language learning using speech recognition and computer graphics techniques (2011–2015). She then moved to CAS Institution of Information Engineering, where her research focused on machine learning and its applications on image understanding and audio signal processing (2015–2017).

Gabriel Salas Gabriel Salas is a research assistant and undergraduate student at the School of Engineering and Computer Science at Victoria University of Wellington. His research focuses on computer graphics and image processing.

Electronic supplementary material

Supplementary material, approximately 111 MB.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Zhang, FL., Barnes, C., Zhang, HT. et al. Coherent video generation for multiple hand-held cameras with dynamic foreground. Comp. Visual Media 6, 291–306 (2020). https://doi.org/10.1007/s41095-020-0187-3

Download citation

Received: 30 June 2020
Accepted: 16 July 2020
Published: 03 September 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s41095-020-0187-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Coherent video generation for multiple hand-held cameras with dynamic foreground

Abstract

Article PDF

Similar content being viewed by others

Efficient Video Cutout by Paint Selection

Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes

E-GrabCut: an economic method of iterative video object extraction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 111 MB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Coherent video generation for multiple hand-held cameras with dynamic foreground

Abstract

Article PDF

Similar content being viewed by others

Efficient Video Cutout by Paint Selection

Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes

E-GrabCut: an economic method of iterative video object extraction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 111 MB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation