Abstract
We present a system that converts annotated broadcast video of tennis matches into interactively controllable video sprites that behave and appear like professional tennis players. Our approach is based on controllable video textures and utilizes domain knowledge of the cyclic structure of tennis rallies to place clip transitions and accept control inputs at key decision-making moments of point play. Most importantly, we use points from the video collection to model a player’s court positioning and shot selection decisions during points. We use these behavioral models to select video clips that reflect actions the real-life player is likely to take in a given match-play situation, yielding sprites that behave realistically at the macro level of full points, not just individual tennis motions. Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences, such as the creation of matchups between players that have not competed in real life or interactive control of players in the Wimbledon final. According to expert tennis players, the rallies generated using our approach are significantly more realistic in terms of player behavior than video sprite methods that only consider the quality of motion transitions during video synthesis.
The supplementary material/video are available at our https://cs.stanford.edu/~haotianz/research/vid2player/ project website.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Vid2Player: Controllable Video Sprites That Behave and Appear Like Professional Tennis Players
- Sai Bi, Kalyan Sunkavalli, Federico Perazzi, Eli Shechtman, Vladimir G. Kim, and Ravi Ramamoorthi. 2019. Deep CG2Real: Synthetic-to-real translation via image disentanglement. In Proceedings of the IEEE International Conference on Computer Vision. 2730–2739.Google ScholarCross Ref
- G. Bradski. 2000. The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000).Google Scholar
- H. Brody, R. Cross, and C. Lindsey. 2004. The Physics and Technology of Tennis. Racquet Tech Publishing.Google Scholar
- Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A. Efros. 2019. Everybody dance now. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google Scholar
- Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. 2018. Pairedcyclegan: Asymmetric style transfer for applying and removing makeup. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 40–48.Google ScholarCross Ref
- Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789–8797.Google ScholarCross Ref
- Alexei A. Efros, Alexander C. Berg, Greg Mori, and Jitendra Malik. 2003. Recognizing action at a distance. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV’03). IEEE Computer Society, 726. Google ScholarCross Ref
- Dirk Farin, Susanne Krabbe, Wolfgang Effelsberg et al. 2003. Robust camera calibration for sport videos using court models. In Storage and Retrieval Methods and Applications for Multimedia 2004, Vol. 5307. International Society for Optics and Photonics, 80–91.Google Scholar
- Tharindu Fernando, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2019. Memory augmented deep generative models for forecasting the next shot location in tennis. IEEE Trans. Knowl. Data Eng. 32, 9 (2019), 1785–1797. DOI:https://doi.org/10.1109/TKDE.2019.2911507Google ScholarDigital Library
- Matthew Flagg, Atsushi Nakazawa, Qiushuang Zhang, Sing Bing Kang, Young Kee Ryu, Irfan Essa, and James M. Rehg. 2009. Human video textures. In Proceedings of the Symposium on Interactive 3D Graphics and Games (I3D’09). Association for Computing Machinery, New York, NY, 199–206.Google Scholar
- Oran Gafni, Lior Wolf, and Yaniv Taigman. 2019. Vid2Game: Controllable characters extracted from real-world videos. Retrieved from https://arXiv:1904.08379.Google Scholar
- Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. 2018. Detect-and-track: Efficient pose estimation in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 350–359.Google ScholarCross Ref
- Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7297–7306.Google ScholarCross Ref
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 2980–2988.Google Scholar
- P. Isola, J. Zhu, T. Zhou, and A. A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967–5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google Scholar
- Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep video portraits. ACM Trans. Graph. 37, 4, Article 163 (July 2018), 14 pages. DOI:https://doi.org/10.1145/3197517.3201283Google ScholarDigital Library
- Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2002. Motion graphs. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’02). Association for Computing Machinery, New York, NY, 473–482.Google ScholarDigital Library
- J. Lai, C. Chen, P. Wu, C. Kao, M. Hu, and S. Chien. 2012. Tennis real play. IEEE Trans. Multimedia 14, 6 (Dec. 2012), 1602–1617. DOI:https://doi.org/10.1109/TMM.2012.2197190Google ScholarDigital Library
- H. Le, P. Carr, Y. Yue, and P. Lucey. 2017. Data-driven ghosting using deep imitation learning. In Proceedings of the MIT Sloan Sports Analytics Conference (MITSSAC’17). Boston, MA.Google Scholar
- Jehee Lee, Jinxiang Chai, Paul S. A. Reitsma, Jessica K. Hodgins, and Nancy S. Pollard. 2002. Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21, 3 (July 2002), 491–500. DOI:https://doi.org/10.1145/566654.566607Google ScholarDigital Library
- L. Liu, W. Xu, M. Zollhoefer, H. Kim, F. Bernard, M. Habermann, W. Wang, and C. Theobalt. 2019. Neural animation and reenactment of human actor videos. ACM Trans. Graph. 38, 5 (2019), 1–14.Google ScholarDigital Library
- N. Owens, C. Harris, and C. Stennett. 2003. Hawk-eye tennis system. In Proceedings of the International Conference on Visual Information Engineering (VIE’03). 182–185. Google Scholar
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarDigital Library
- Paul Power, Hector Ruiz, Xinyu Wei, and Patrick Lucey. 2017. Not all passes are created equal: Objectively measuring the risk and reward of passes in soccer from tracking data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). Association for Computing Machinery, New York, NY. https://doi.org/10.1145/3097983.3098051Google ScholarDigital Library
- Scott Schaefer, Travis McPhail, and Joe Warren. 2006. Image deformation using moving least squares. ACM Trans. Graph. 25, 3 (July 2006), 533–540. DOI:https://doi.org/10.1145/1141911.1141920Google ScholarDigital Library
- Arno Schödl and Irfan A. Essa. 2002. Controlled animation of video sprites. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’02). Association for Computing Machinery, 121–127.Google Scholar
- Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). ACM Press/Addison-Wesley Publishing Co., 489–498. https://doi.org/10.1145/344779.345012Google ScholarDigital Library
- Second Spectrum, Inc.2020. Second Spectrum Corporate Website. Retrieved from https://www.secondspectrum.com.Google Scholar
- Bernard W Silverman. 2018. Density Estimation for Statistics and Data Analysis. Routledge.Google Scholar
- Marty Smith. 2017. Absolute Tennis: The Best and Next Way to Play the Game. New Chapter Press.Google Scholar
- StatsPerform, Inc.2020. SportVU 2.0: Real-Time Optical Tracking. Retrieved from https://www.statsperform.com/team-performance/football/optical-tracking.Google Scholar
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. Video-to-video synthesis. In Advances in Neural Information Processing Systems. MIT Press, 1144–1156.Google Scholar
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018b. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- X. Wei, P. Lucey, S. Morgan, M. Reid, and S. Sridharan. 2016. The thin edge of the wedge: Accurately predicting shot outcomes in tennis using style and context priors. In Proceedings of the MIT Sloan Sports Analytics Conference (MITSSAC’16). Boston, MA.Google Scholar
- X. Wei, P. Lucey, S. Morgan, and S. Sridharan. 2016. Forecasting the next shot location in tennis using fine-grained spatiotemporal tracking data. IEEE Trans. Knowl. Data Eng. 28, 11 (Nov. 2016), 2988–2997.Google ScholarDigital Library
- N. Xu, B. Price, S. Cohen, and T. Huang. 2017. Deep image matting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 311–320. DOI:https://doi.org/10.1109/CVPR.2017.41Google Scholar
- J. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2242–2251. Google Scholar
Index Terms
- Vid2Player: Controllable Video Sprites That Behave and Appear Like Professional Tennis Players
Recommendations
A table tennis game for three players
OZCHI '06: Proceedings of the 18th Australia conference on Computer-Human Interaction: Design: Activities, Artefacts and EnvironmentsTable tennis is a game that can provide healthy exercise and is also a social pastime for players of all ages across the world. However, players have to be collocated to play, and three players cannot usually play at the same time in fair or equitable ...
Controlled animation of video sprites
SCA '02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animationWe introduce a new optimization algorithm for video sprites to animate realistic-looking characters. Video sprites are animations created by rearranging recorded video frames of a moving object. Our new technique to find good frame arrangements is based ...
On the (page) ranking of professional tennis players
EPEW'12: Proceedings of the 9th European conference on Computer Performance EngineeringWe explore the relationship between official rankings of professional tennis players and rankings computed using a variant of the PageRank algorithm as proposed by Radicchi in 2011. We show Radicchi's equations follow a natural interpretation of the ...
Comments