skip to main content
research-article

Vid2Player: Controllable Video Sprites That Behave and Appear Like Professional Tennis Players

Published:05 May 2021Publication History
Skip Abstract Section

Abstract

We present a system that converts annotated broadcast video of tennis matches into interactively controllable video sprites that behave and appear like professional tennis players. Our approach is based on controllable video textures and utilizes domain knowledge of the cyclic structure of tennis rallies to place clip transitions and accept control inputs at key decision-making moments of point play. Most importantly, we use points from the video collection to model a player’s court positioning and shot selection decisions during points. We use these behavioral models to select video clips that reflect actions the real-life player is likely to take in a given match-play situation, yielding sprites that behave realistically at the macro level of full points, not just individual tennis motions. Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences, such as the creation of matchups between players that have not competed in real life or interactive control of players in the Wimbledon final. According to expert tennis players, the rallies generated using our approach are significantly more realistic in terms of player behavior than video sprite methods that only consider the quality of motion transitions during video synthesis.

The supplementary material/video are available at our https://cs.stanford.edu/~haotianz/research/vid2player/ project website.

Skip Supplemental Material Section

Supplemental Material

References

  1. Sai Bi, Kalyan Sunkavalli, Federico Perazzi, Eli Shechtman, Vladimir G. Kim, and Ravi Ramamoorthi. 2019. Deep CG2Real: Synthetic-to-real translation via image disentanglement. In Proceedings of the IEEE International Conference on Computer Vision. 2730–2739.Google ScholarGoogle ScholarCross RefCross Ref
  2. G. Bradski. 2000. The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000).Google ScholarGoogle Scholar
  3. H. Brody, R. Cross, and C. Lindsey. 2004. The Physics and Technology of Tennis. Racquet Tech Publishing.Google ScholarGoogle Scholar
  4. Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A. Efros. 2019. Everybody dance now. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarGoogle Scholar
  5. Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. 2018. Pairedcyclegan: Asymmetric style transfer for applying and removing makeup. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 40–48.Google ScholarGoogle ScholarCross RefCross Ref
  6. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789–8797.Google ScholarGoogle ScholarCross RefCross Ref
  7. Alexei A. Efros, Alexander C. Berg, Greg Mori, and Jitendra Malik. 2003. Recognizing action at a distance. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV’03). IEEE Computer Society, 726. Google ScholarGoogle ScholarCross RefCross Ref
  8. Dirk Farin, Susanne Krabbe, Wolfgang Effelsberg et al. 2003. Robust camera calibration for sport videos using court models. In Storage and Retrieval Methods and Applications for Multimedia 2004, Vol. 5307. International Society for Optics and Photonics, 80–91.Google ScholarGoogle Scholar
  9. Tharindu Fernando, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2019. Memory augmented deep generative models for forecasting the next shot location in tennis. IEEE Trans. Knowl. Data Eng. 32, 9 (2019), 1785–1797. DOI:https://doi.org/10.1109/TKDE.2019.2911507Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Matthew Flagg, Atsushi Nakazawa, Qiushuang Zhang, Sing Bing Kang, Young Kee Ryu, Irfan Essa, and James M. Rehg. 2009. Human video textures. In Proceedings of the Symposium on Interactive 3D Graphics and Games (I3D’09). Association for Computing Machinery, New York, NY, 199–206.Google ScholarGoogle Scholar
  11. Oran Gafni, Lior Wolf, and Yaniv Taigman. 2019. Vid2Game: Controllable characters extracted from real-world videos. Retrieved from https://arXiv:1904.08379.Google ScholarGoogle Scholar
  12. Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. 2018. Detect-and-track: Efficient pose estimation in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 350–359.Google ScholarGoogle ScholarCross RefCross Ref
  13. Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7297–7306.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 2980–2988.Google ScholarGoogle Scholar
  15. P. Isola, J. Zhu, T. Zhou, and A. A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967–5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google ScholarGoogle Scholar
  16. Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep video portraits. ACM Trans. Graph. 37, 4, Article 163 (July 2018), 14 pages. DOI:https://doi.org/10.1145/3197517.3201283Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2002. Motion graphs. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’02). Association for Computing Machinery, New York, NY, 473–482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Lai, C. Chen, P. Wu, C. Kao, M. Hu, and S. Chien. 2012. Tennis real play. IEEE Trans. Multimedia 14, 6 (Dec. 2012), 1602–1617. DOI:https://doi.org/10.1109/TMM.2012.2197190Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Le, P. Carr, Y. Yue, and P. Lucey. 2017. Data-driven ghosting using deep imitation learning. In Proceedings of the MIT Sloan Sports Analytics Conference (MITSSAC’17). Boston, MA.Google ScholarGoogle Scholar
  20. Jehee Lee, Jinxiang Chai, Paul S. A. Reitsma, Jessica K. Hodgins, and Nancy S. Pollard. 2002. Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21, 3 (July 2002), 491–500. DOI:https://doi.org/10.1145/566654.566607Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Liu, W. Xu, M. Zollhoefer, H. Kim, F. Bernard, M. Habermann, W. Wang, and C. Theobalt. 2019. Neural animation and reenactment of human actor videos. ACM Trans. Graph. 38, 5 (2019), 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Owens, C. Harris, and C. Stennett. 2003. Hawk-eye tennis system. In Proceedings of the International Conference on Visual Information Engineering (VIE’03). 182–185. Google ScholarGoogle Scholar
  23. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Paul Power, Hector Ruiz, Xinyu Wei, and Patrick Lucey. 2017. Not all passes are created equal: Objectively measuring the risk and reward of passes in soccer from tracking data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). Association for Computing Machinery, New York, NY. https://doi.org/10.1145/3097983.3098051Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Scott Schaefer, Travis McPhail, and Joe Warren. 2006. Image deformation using moving least squares. ACM Trans. Graph. 25, 3 (July 2006), 533–540. DOI:https://doi.org/10.1145/1141911.1141920Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Arno Schödl and Irfan A. Essa. 2002. Controlled animation of video sprites. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’02). Association for Computing Machinery, 121–127.Google ScholarGoogle Scholar
  27. Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). ACM Press/Addison-Wesley Publishing Co., 489–498. https://doi.org/10.1145/344779.345012Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Second Spectrum, Inc.2020. Second Spectrum Corporate Website. Retrieved from https://www.secondspectrum.com.Google ScholarGoogle Scholar
  29. Bernard W Silverman. 2018. Density Estimation for Statistics and Data Analysis. Routledge.Google ScholarGoogle Scholar
  30. Marty Smith. 2017. Absolute Tennis: The Best and Next Way to Play the Game. New Chapter Press.Google ScholarGoogle Scholar
  31. StatsPerform, Inc.2020. SportVU 2.0: Real-Time Optical Tracking. Retrieved from https://www.statsperform.com/team-performance/football/optical-tracking.Google ScholarGoogle Scholar
  32. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. Video-to-video synthesis. In Advances in Neural Information Processing Systems. MIT Press, 1144–1156.Google ScholarGoogle Scholar
  33. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018b. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  34. X. Wei, P. Lucey, S. Morgan, M. Reid, and S. Sridharan. 2016. The thin edge of the wedge: Accurately predicting shot outcomes in tennis using style and context priors. In Proceedings of the MIT Sloan Sports Analytics Conference (MITSSAC’16). Boston, MA.Google ScholarGoogle Scholar
  35. X. Wei, P. Lucey, S. Morgan, and S. Sridharan. 2016. Forecasting the next shot location in tennis using fine-grained spatiotemporal tracking data. IEEE Trans. Knowl. Data Eng. 28, 11 (Nov. 2016), 2988–2997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. Xu, B. Price, S. Cohen, and T. Huang. 2017. Deep image matting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 311–320. DOI:https://doi.org/10.1109/CVPR.2017.41Google ScholarGoogle Scholar
  37. J. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2242–2251. Google ScholarGoogle Scholar

Index Terms

  1. Vid2Player: Controllable Video Sprites That Behave and Appear Like Professional Tennis Players

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 40, Issue 3
        June 2021
        264 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3463476
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 May 2021
        • Accepted: 1 February 2021
        • Revised: 1 December 2020
        • Received: 1 August 2020
        Published in tog Volume 40, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format