Abstract
The NFL collects detailed tracking data capturing the location of all players and the ball during each play. Although the raw form of this data is not publicly available, the NFL releases a set of aggregated statistics via their Next Gen Stats (NGS) platform. They also provide charts showing the locations of pass attempts and outcomes for individual quarterbacks. Our work aims to partially close the gap between what data is available privately (to NFL teams) and publicly, and our contribution is two-fold. First, we introduce an image processing tool designed specifically for extracting the raw data from the NGS pass charts. We extract the pass outcome, coordinates, and other metadata. Second, we analyze the resulting dataset, examining the spatial tendencies and performances of individual quarterbacks and defenses. We use a generalized additive model for completion percentages by field location. We introduce a naive Bayes approach for estimating the 2-D completion percentage surfaces of individual teams and quarterbacks, and we provide a one-number summary, completion percentage above expectation (CPAE), for evaluating quarterbacks and team defenses. We find that our pass location data closely matches the NFL’s tracking data, and that our CPAE metric closely matches the NFL’s proprietary CPAE metric.
A Data scraped from next gen stats
Variable | Description |
---|---|
completions | number of completions thrown |
touchdowns | number of touchdowns thrown |
attempts | number of passes thrown |
interceptions | number of interceptions thrown |
extraLargeImg | URL of extra-large-sized image (1200 × 1200) |
week | week of game |
gameId | 10-digit game identification number |
season | NFL season |
firstName | first name of player |
lastName | last name of player |
team | team name of player |
position | position of player |
seasonType | regular (“reg”) or postseason (“post”) |
B Example subset of data
game_id | team | week | name | pass_type | x_coord | y_coord | type | home_team | away_team | season |
---|---|---|---|---|---|---|---|---|---|---|
2018020400 | PHI | super-bowl | Nick Foles | COMPLETE | −3.6 | 16.9 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | COMPLETE | 16.2 | −3.0 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | COMPLETE | 11.5 | −6.4 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | TOUCHDOWN | −8.5 | 5.7 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | TOUCHDOWN | −18.8 | 30.1 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | TOUCHDOWN | −19.3 | 41.2 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | INTERCEPTION | 21.8 | 37.9 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | INCOMPLETE | 5.1 | 7.9 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | INCOMPLETE | −12.9 | 39.6 | post | NE | PHI | 2017 |
2018020400 | PHI | super-bowl | Nick Foles | INCOMPLETE | 26.1 | 8.0 | post | NE | PHI | 2017 |
C QB CPAE
QB | CPAE17 | npasses_2017 | CPAE18 | npasses_2018 |
---|---|---|---|---|
Drew Brees | 4.21 | 439 | 6.14 | 473 |
Ryan Fitzpatrick | 0.47 | 112 | 3.42 | 157 |
Nick Foles | −3.64 | 152 | 3.42 | 229 |
Russell Wilson | 5.77 | 309 | 3.39 | 295 |
Matthew Ryan | 2.77 | 524 | 3.22 | 552 |
Carson Wentz | 0.07 | 333 | 3.08 | 313 |
Derek Carr | 0.27 | 300 | 2.96 | 429 |
Kirk Cousins | −0.15 | 394 | 2.53 | 467 |
Derrick Watson | 2.53 | 110 | 2.43 | 492 |
Cameron Newton | −1.18 | 352 | 2.14 | 392 |
Marcus Mariota | 0.95 | 495 | 1.75 | 275 |
Jared Goff | −0.41 | 428 | 1.7 | 553 |
Ben Roethlisberger | 2.46 | 394 | 1.29 | 518 |
Patrick Mahomes | 1.27 | 445 | ||
Philip Rivers | 0.34 | 416 | 1.15 | 560 |
Rayne Prescott | −0.14 | 408 | 1.11 | 434 |
Jameis Winston | 2.55 | 268 | 0.44 | 295 |
Andrew Luck | 0.33 | 559 | ||
Mitchell Trubisky | −1.36 | 262 | 0.27 | 323 |
Ryan Tannehill | 0.08 | 191 | ||
Brock Osweiler | 0.06 | 163 | ||
John Stafford | 3.14 | 384 | −0.04 | 480 |
Aaron Rodgers | −0.15 | 573 | ||
Baker Mayfield | −0.38 | 269 | ||
Alexander Smith | 4.31 | 418 | −0.88 | 254 |
Tom Brady | 3.23 | 524 | −0.89 | 519 |
Elisha Manning | −2.22 | 369 | −1 | 536 |
Sam Darnold | −1.05 | 289 | ||
Casey Keenum | 0.33 | 382 | −1.38 | 509 |
Joseph Flacco | −0.23 | 438 | −1.67 | 367 |
Nicholas Mullens | −1.87 | 118 | ||
Andrew Dalton | −1.25 | 307 | −1.89 | 195 |
Lamar Jackson | −2.07 | 112 | ||
Joshua Allen | −3.44 | 237 | ||
Casey Beathard | −4.94 | 185 | −4.37 | 168 |
Joshua Rosen | −4.54 | 260 | ||
Jeffrey Driskel | −4.83 | 110 | ||
Robby Bortles | −1.9 | 399 | −5.04 | 336 |
D Defense CPAE
Team | CPAE17 | npasses_2017 | CPAE18 | npasses_2018 |
---|---|---|---|---|
TB | 3.54 | 380 | 6.89 | 452 |
ATL | 2.36 | 524 | 4.31 | 552 |
NO | −0.68 | 439 | 4.07 | 495 |
DAL | 2.81 | 408 | 3.82 | 434 |
IND | 1.91 | 388 | 2.94 | 559 |
CIN | −2.37 | 307 | 2.52 | 305 |
DET | 5.45 | 384 | 2.46 | 480 |
MIN | −4.93 | 382 | 2.39 | 467 |
MIA | 1.86 | 355 | 2.38 | 354 |
WAS | −4.5 | 394 | 2.17 | 403 |
SEA | −3.89 | 309 | 1.33 | 295 |
CAR | 1.59 | 352 | 1.19 | 443 |
PHI | −0.59 | 485 | 0.99 | 542 |
HOU | 7 | 410 | 0.27 | 492 |
ARI | −1.93 | 462 | 0.22 | 354 |
SF | 0.95 | 500 | −0.33 | 344 |
JAX | −3.99 | 399 | −0.7 | 401 |
NE | 1.56 | 524 | −1 | 519 |
GB | 7.1 | 363 | −1.06 | 573 |
NYG | 1.19 | 402 | −1.08 | 536 |
LAC | 0.81 | 416 | −1.11 | 560 |
TEN | −0.85 | 526 | −1.13 | 323 |
DEN | −1.27 | 386 | −1.13 | 509 |
CLE | 3.97 | 427 | −1.27 | 339 |
BUF | 2.41 | 212 | −1.48 | 327 |
PIT | −1.75 | 421 | −1.65 | 526 |
NYJ | −3.45 | 370 | −1.93 | 352 |
KC | −1.64 | 452 | −1.98 | 445 |
OAK | 1.71 | 324 | −2.12 | 429 |
LA | −2.81 | 462 | −2.35 | 553 |
CHI | 3.32 | 368 | −2.43 | 360 |
BAL | −3.36 | 438 | −5.24 | 479 |
Lower number represents better defense.
References
Arthur, David and Sergei Vassilvitskii. 2007. “K-means++: The Advantages of Careful Seeding.” Pp. 1027–1035, 9 Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms.http://dl.acm.org/citation.cfm?id=1283383.1283494, New Orleans, Louisiana: Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, SODA ’07.Search in Google Scholar
Baumer, Benjamin, Shane Jensen, and Gregory Matthews. 2015. “openWAR: An Open Source System for Evaluating Overall Player Performance in Major League Baseball.” Journal of Quantitative Analysis in Sports 11(2): 69–84.10.1515/jqas-2014-0098Search in Google Scholar
Berri, David J. and John Charles Bradbury. 2010. “Working in the Land of the Metricians.” Journal of Sports Economics 11(1): 29–47. Los Angeles, CA: Sage Publications Sage CA.10.1177/1527002509354891Search in Google Scholar
Burke, Brian. 2019. “DeepQB: Deep Learning with Player Tracking to Quantify Quarterback Decision-Making & Performance”. 13th MIT Sloan Sports Analytics Conference.Search in Google Scholar
Casella, Paul. 2015. Statcast Primer: Baseball will Never be the Same.https://www.mlb.com/news/statcast-primer-baseball-will-never-be-the-same/c-119234412.Search in Google Scholar
Cervone, Dan, Luke Bornn, and Kirk Goldsberry. 2016a. “NBA Court Realty.” 10th MIT Sloan Sports Analytics Conference.Search in Google Scholar
Cervone, Daniel, Alex D’Amour, Luke Bornn, and Kirk Goldsberry. 2016b. “A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes.” Journal of the American Statistical Association 111(514): 585–599. Taylor & Francis.10.1080/01621459.2016.1141685Search in Google Scholar
Elmore, Ryan and Peter DeWitt. 2017. ballr: Access to Current and Historical Basketball Data. R package version 0.1.1, https://CRAN.R-project.org/package=ballr.Search in Google Scholar
Daley, D. J. and Vere-Jones, D. 2006. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. New York, NY, USA: Springer New York Inc., Springer Science & Business Media.Search in Google Scholar
Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. “A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.” Pp. 226–231, 6 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96, arbitrary shape of clusters, clustering algorithms, efficiency on large spatial databases, handling nlj4-275oise, Portland, Oregon: AAAI Press. http://dl.acm.org/citation.cfm?id=3001460.3001507.Search in Google Scholar
Fast, Mike. 2010. “What the Heck is PITCHf/x?” The Hardball Times Baseball Annual 2010. http://baseball.physics.illinois.edu/FastPFXGuide.pdf.Search in Google Scholar
Fernández, Javier, F. C. Barcelona, Luke Bornn, and Dan Cervone. 2019. “Decomposing the Immeasurable Sport: A Deep Learning Expected Possession Value Framework for Soccer.” 13th Annual MIT Sloan Sports Analytics Conference.Search in Google Scholar
Franks, Alexander M., Alexander D’Amour, Daniel Cervone, and Luke Bornn. 2016. “Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics.” Journal of Quantitative Analysis in Sports 12(4): 151–165. De Gruyter.10.1515/jqas-2016-0098Search in Google Scholar
Friendly, Michael, Chris Dalzell, Martin Monkman, and Dennis Murphy. 2019. Lahman: Sean ’Lahman’ Baseball Database. R package version 7.0-1, https://CRAN.R-project.org/package=Lahman.Search in Google Scholar
Gudmundsson, Joachim and Horton, Michael. 2017. “Spatio-Temporal Analysis of Team Sports.” ACM Computing Surveys (CSUR). 50(2): 22. ACM.10.1145/3054132Search in Google Scholar
Hastie, Trevor J. and Robert J. Tibshirani. 1990. “Generalized Additive Models.” Monographs on Statistics and Applied Probability 43: 205–208. Chapman and Hall.10.21236/ADA147454Search in Google Scholar
Hernandez, T. J. 2019a. Most Predictable Running Back Stats (2019 Update). https://www.4for4.com/fantasy-football/2019/preseason/most-predictable-running-back-stats.Search in Google Scholar
Hernandez, T. J. 2019b. Most Predictable Quarterback Stats (2019 Update).https://www.4for4.com/fantasy-football/2019/preseason/most-predictable-quarterback-stats-2019-update.Search in Google Scholar
Horowitz, Maksim, Ron Yurko, and Samuel L. Ventura. 2017. nflscrapR: Compiling the NFL play-by-play API for easy use in R. R package version 1.4.0, https://github.com/maksimhorowitz/nflscrapR.Search in Google Scholar
Julia, S. Stiller and Michael J. Lopez. 2019. Meta-metrics to Quantify Properties of Quarterback Statistics. Cambridge, Massachusetts: Poster presented at the 2019 New England Symposium on Statistics in Sports, Harvard UniversitySearch in Google Scholar
Katz, Sharon and Brian Burke. 2017. How is Total QBR Calculated? We Explain our Quarterback Rating. http://www.espn.com/blog/statsinfo/post/_/id/123701/how-is-total-qbr-calculated-we-explain-our-quarterback-rating.Search in Google Scholar
Koschan, Andreas, and Mongi A. Abidi. 2008. Digital Color Image Processing. New York, NY, USA: Wiley-Interscience.10.1002/9780470230367Search in Google Scholar
Le, Hoang Minh, Yisong Yue, Peter A. Carr, and Patrick Lucey. 2017. “Coordinated Multi-Agent Imitation Learning” Proceedings of the 34th International Conference on International Conference on Machine Learning (ICML).Search in Google Scholar
Lowe, Zach. 2013. Lights, Cameras, Revolution.http://grantland.com/features/the-toronto-raptors-sportvu-cameras-nba-analytical-revolution/. mar, January 24, 2018.Search in Google Scholar
Luke Benz. 2019. ncaahoopR: NCAA Men’s Basketball Play-By-Play Functionality., R package version 1.4.2,Search in Google Scholar
MacQueen, J. 1967. “Some methods for classification and analysis of multivariate observations.” Pp. 281–297 Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley, Calif.: University of California Press. https://projecteuclid.org/euclid.bsmsp/1200512992.Search in Google Scholar
NBA. 2013. NBA Partners with Stats LLC for Tracking Technology.https://www.nba.com/2013/news/09/05/nba-stats-llc-player-tracking-technology/.Search in Google Scholar
NFL. 2019. NFL Operations: NFL Next Gen Stats.https://operations.nfl.com/the-game/technology/nfl-next-gen-stats/.Search in Google Scholar
Pedersen, Eric, David Miller, Gavin Simpson, and Noam Ross. 2018. Hierarchical Generalized Additive Models: An Introduction with MGCV. doi: 10.7287/peerj.preprints.27320.10.7287/peerj.preprints.27320v1Search in Google Scholar
Power, Paul, Hector Ruiz, Xinyu Wei, and Patrick Lucey. 2017. “Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data.” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17.10.1145/3097983.3098051Search in Google Scholar
Schatz, Aaron. 2006. Methods To Our Madness.https://www.footballoutsiders.com/info/methods#DVOA.Search in Google Scholar
Seidl, Thomas, Aditya Cherukumudi, Andrew Hartnett, Peter Carr, and Patrick Lucey. 2018. “Bhostgusters: Realtime Interactive Play Sketching with Synthesized NBA Defenses.” 12th Annual MIT Sloan Sports Analytics Conference.Search in Google Scholar
Sievert, Carson. 2015. pitchRx: Tools for Harnessing ’MLBAM’ ’Gameday’ Data and Visualizing ’pitchfx’. R package version 1.8.2, http://cpsievert.github.com/pitchRx.Search in Google Scholar
Sievert, Carson and Brian M. Mills. 2017. “Handbook of Statistical Methods and Analyses in Sports.” Pp. 55–82 in Using publicly available baseball data to measure and evaluate pitching performance. Chapman and Hall/CRC.Search in Google Scholar
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis.Search in Google Scholar
Szeliski, Richard. 2010. Computer Vision: Algorithms and Applications. Springer Science & Business Media.Search in Google Scholar
Thomas, A. C. and Samuel L. Ventura. 2013. nhlscrapr: Compiling the NHL Real Time Scoring System Database for Easy Use in R. R package version 1.8.1, https://CRAN.R-project.org/package=nhlscrapr.Search in Google Scholar
Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer, ISBN 0-387-95457-0, http://www.stats.ox.ac.uk/pub/MASS4.10.1007/978-0-387-21706-2Search in Google Scholar
Wood, Simon. 2019. Define tensor product smooths or tensor product interactions in GAM formulae. R package version 1.8-28, https://cran.r-project.org/web/packages/mgcv/index.html.Search in Google Scholar
Wyshynski, Greg. 2019. Inside the Arrival of NHL Player Tracking, from Microchips to Megabets.http://www.espn.com/nhl/story/_/id/25872085/inside-arrival-nhl-player-tracking-microchips-megabets.Search in Google Scholar
Yurko, Ronald, Maksim Horowitz, and Samuel Ventura. 2019. “nflWAR: A Reproducible Method for Offensive Player Evaluation in Football.” Journal of Quantitative Analysis in Sports 15: 163–183.10.1515/jqas-2018-0010Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston