Abstract
The deployment of sensors that characterize the trajectory of pitches and batted balls in three dimensions provides the opportunity to assign an intrinsic value to a pitch that depends on its physical properties and not on its observed outcome. We exploit this opportunity by using a Bayesian framework to learn a set of mappings from five-dimensional velocity, movement, and location vectors to intrinsic pitch values. A kernel method generates nonparametric estimates for the component probability density functions in Bayes theorem while nonparametric regression is used to derive a batted ball weight function that is invariant to the defense, ballpark, and atmospheric conditions. Cross-validation is used to determine the parameters of the model. We use Cronbach’s alpha to show that intrinsic pitch values have a significantly higher reliability than outcome-based pitch values. We also develop a method to combine intrinsic values at the individual pitch level into a statistic that captures the value of a pitcher’s collection of pitches over a period of time. We use this statistic to show that pitchers who outperform their intrinsic values during a season tend to perform worse the following year. We also show that this statistic provides better predictive value for future Earned Run Average (ERA) than either current ERA or Fielding Independent Pitching (FIP).
Appendix I: strike zone transformation
For each pitch recorded by the PITCHf/x system the height of the top and bottom of the batter’s strike zone is specified manually. Since these specifications are error-prone and can vary over time, we use the average of the specified top and bottom for each individual batter over the full season to represent his strike zone in the vertical dimension. For a given season, let Bt and Bb denote the average height of the top and bottom of the strike zone specified for batter B and let Lt and Lb denote the average height of the top and bottom of the strike zone for the league. For the 2014 data set, we used Lt = 3.403 feet and Lb = 1.571 feet. For a pitch with a measured vertical height of
Thus, a pitch at the bottom of any batter’s strike zone (z = Bb) maps to lz = Lb and a pitch at the top of any batter’s strike zone (z = Bt) maps to lz = Lt.
Appendix II: distribution of pitches across counts and outcomes
Table 4 presents the distribution of pitches thrown in the RHP vs. RHB configuration in 2014 for each count and with each outcome. As described in Section 2.2 the outcomes are R0 = ball in play, R1 = called ball, R2 = called strike, R3 = swinging strike, R4 = foul ball, and R5 = batter hit-by-pitch where foul tips that are caught for strikeouts are classified as R3 and not R4.
Count | R0 | R1 | R2 | R3 | R4 | R5 |
---|---|---|---|---|---|---|
0–0 | 7090 | 25,233 | 22,682 | 4683 | 6994 | 151 |
1–0 | 4315 | 7920 | 6023 | 2484 | 4145 | 41 |
2–0 | 1468 | 2296 | 2064 | 639 | 1315 | 14 |
3–0 | 118 | 656 | 1328 | 40 | 117 | 1 |
0–1 | 6250 | 13,552 | 4054 | 4215 | 6079 | 125 |
1–1 | 5581 | 8481 | 3085 | 3317 | 5390 | 78 |
2–1 | 3102 | 3417 | 1491 | 1494 | 2825 | 39 |
3–1 | 1208 | 1235 | 808 | 446 | 1125 | 11 |
0–2 | 3141 | 7817 | 739 | 2438 | 3394 | 70 |
1–2 | 5265 | 9057 | 1106 | 3705 | 5464 | 98 |
2–2 | 5141 | 5517 | 964 | 2763 | 5153 | 82 |
3–2 | 3485 | 2260 | 582 | 21 | 3243 | 38 |
Appendix III: optimal bandwidths for kernel density estimates
In Tables 5–10, we present the optimal bandwidths derived for different configurations using the process described in Section 2.5.
Outcome Rj | |||||
---|---|---|---|---|---|
ball in play | 1.45 | 1.30 | 1.30 | 0.215 | 0.255 |
called ball | 1.55 | 1.30 | 1.35 | 0.260 | 0.290 |
called strike | 1.45 | 1.20 | 1.20 | 0.180 | 0.185 |
swinging strike | 1.55 | 1.30 | 1.40 | 0.280 | 0.345 |
foul ball | 1.45 | 1.30 | 1.20 | 0.255 | 0.285 |
hit-by-pitch | 2.50 | 1.60 | 2.45 | 0.365 | 0.360 |
Outcome Rj | |||||
---|---|---|---|---|---|
ball in play | 1.55 | 1.25 | 1.30 | 0.255 | 0.280 |
called ball | 1.60 | 1.40 | 1.40 | 0.295 | 0.375 |
called strike | 1.75 | 1.55 | 1.55 | 0.190 | 0.225 |
swinging strike | 1.75 | 1.40 | 1.40 | 0.280 | 0.345 |
foul ball | 1.55 | 1.25 | 1.40 | 0.260 | 0.300 |
hit-by-pitch | 2.45 | 2.65 | 2.00 | 0.285 | 0.350 |
Outcome Rj | |||||
---|---|---|---|---|---|
ball in play | 1.55 | 1.25 | 1.25 | 0.240 | 0.270 |
called ball | 1.80 | 1.45 | 1.50 | 0.280 | 0.335 |
called strike | 1.65 | 1.50 | 1.45 | 0.220 | 0.235 |
swinging strike | 1.65 | 1.40 | 1.45 | 0.330 | 0.360 |
foul ball | 1.40 | 1.45 | 1.35 | 0.275 | 0.285 |
hit-by-pitch | 2.15 | 2.70 | 2.00 | 0.315 | 0.255 |
Outcome Rj | |||||
---|---|---|---|---|---|
ball in play | 1.60 | 1.25 | 1.30 | 0.220 | 0.260 |
called ball | 1.55 | 1.30 | 1.35 | 0.250 | 0.300 |
called strike | 1.50 | 1.25 | 1.25 | 0.170 | 0.190 |
swinging strike | 1.85 | 1.50 | 1.40 | 0.290 | 0.315 |
foul ball | 1.50 | 1.30 | 1.35 | 0.240 | 0.285 |
hit-by-pitch | 2.85 | 2.95 | 2.75 | 0.520 | 0.465 |
Outcome Rj | |||||
---|---|---|---|---|---|
ball in play | 1.35 | 1.30 | 1.55 | 0.245 | 0.290 |
called ball | 1.75 | 1.45 | 1.50 | 0.270 | 0.355 |
called strike | 1.95 | 1.55 | 1.70 | 0.175 | 0.240 |
swinging strike | 1.75 | 1.45 | 1.50 | 0.305 | 0.360 |
foul ball | 1.45 | 1.35 | 1.35 | 0.275 | 0.300 |
hit-by-pitch | 2.80 | 2.65 | 2.90 | 0.260 | 1.035 |
Outcome Rj | |||||
---|---|---|---|---|---|
ball in play | 1.60 | 1.35 | 1.25 | 0.260 | 0.280 |
called ball | 1.75 | 1.50 | 1.50 | 0.275 | 0.325 |
called strike | 1.60 | 1.40 | 1.55 | 0.195 | 0.235 |
swinging strike | 1.85 | 1.45 | 1.55 | 0.300 | 0.370 |
foul ball | 1.55 | 1.35 | 1.35 | 0.260 | 0.290 |
hit-by-pitch | 1.55 | 1.80 | 1.55 | 0.455 | 0.795 |
Appendix IV: data used to evaluate intrinsic pitch statistics
In Tables 11 and 12, we present the data for the 34 pitchers who were used to evaluate the intrinsic pitch statistics as described in Section 4.
Pitcher | 2014 | 2014 | 2015 | ERA |
---|---|---|---|---|
OMI ⋅ 1000 | ERA | ERA | Difference | |
Chris Archer | −11.1 | 3.33 | 3.23 | −0.10 |
A.J. Burnett | 8.8 | 4.59 | 3.18 | −1.41 |
Bartolo Colon | 8.0 | 4.09 | 4.16 | 0.07 |
Johnny Cueto | −6.1 | 2.25 | 3.44 | 1.19 |
R.A. Dickey | 4.7 | 3.71 | 3.91 | 0.20 |
Yovani Gallardo | 16.1 | 3.51 | 3.42 | −0.09 |
Kyle Gibson | 7.0 | 4.47 | 3.84 | −0.63 |
Sonny Gray | −10.8 | 3.08 | 2.73 | −0.35 |
Zack Greinke | 10.7 | 2.71 | 1.66 | −1.05 |
Jason Hammel | 8.0 | 3.47 | 3.74 | 0.27 |
Aaron Harang | 6.3 | 3.57 | 4.86 | 1.29 |
Dan Haren | 13.2 | 4.02 | 3.60 | −0.42 |
Felix Hernandez | −13.3 | 2.14 | 3.53 | 1.39 |
Ian Kennedy | 3.9 | 3.63 | 4.28 | 0.65 |
Corey Kluber | 3.5 | 2.44 | 3.49 | 1.05 |
Tom Koehler | −2.8 | 3.81 | 4.08 | 0.27 |
John Lackey | 17.3 | 3.82 | 2.77 | −1.05 |
Mike Leake | 16.3 | 3.70 | 3.70 | 0.00 |
Colby Lewis | 25.1 | 5.18 | 4.66 | −0.52 |
Lance Lynn | −4.9 | 2.74 | 3.03 | 0.29 |
Shelby Miller | 7.9 | 3.74 | 3.02 | −0.72 |
Jake Odorizzi | −1.3 | 4.13 | 3.35 | −0.78 |
Rick Porcello | 0.5 | 3.43 | 4.92 | 1.49 |
Garrett Richards | −7.3 | 2.61 | 3.65 | 1.04 |
Tyson Ross | −5.8 | 2.81 | 3.26 | 0.45 |
Jeff Samardzija | 6.1 | 2.99 | 4.96 | 1.97 |
Max Scherzer | 2.1 | 3.15 | 2.79 | −0.36 |
James Shields | −0.2 | 3.21 | 3.91 | 0.70 |
Alfredo Simon | −3.0 | 3.44 | 5.05 | 1.61 |
Julio Teheran | −5.4 | 2.89 | 4.04 | 1.15 |
Chris Tillman | 3.2 | 3.34 | 4.99 | 1.65 |
Yordano Ventura | −8.4 | 3.20 | 4.08 | 0.88 |
Edinson Volquez | −13.3 | 3.04 | 3.55 | 0.51 |
Jordan Zimmermann | −4.9 | 2.66 | 3.66 | 1.00 |
Pitcher | 2014 | 2014 | 2014 | ERA |
---|---|---|---|---|
ERA | FIP | ERA – FIP | Difference | |
Chris Archer | 3.33 | 3.39 | −0.06 | −0.10 |
A.J. Burnett | 4.59 | 4.14 | 0.45 | −1.41 |
Bartolo Colon | 4.09 | 3.57 | 0.52 | 0.07 |
Johnny Cueto | 2.25 | 3.30 | −1.05 | 1.19 |
R.A. Dickey | 3.71 | 4.32 | −0.61 | 0.20 |
Yovani Gallardo | 3.51 | 3.94 | −0.43 | −0.09 |
Kyle Gibson | 4.47 | 3.80 | 0.67 | −0.63 |
Sonny Gray | 3.08 | 3.46 | −0.38 | −0.35 |
Zack Greinke | 2.71 | 2.97 | −0.26 | −1.05 |
Jason Hammel | 3.47 | 3.92 | −0.45 | 0.27 |
Aaron Harang | 3.57 | 3.57 | 0.00 | 1.29 |
Dan Haren | 4.02 | 4.09 | −0.07 | −0.42 |
Felix Hernandez | 2.14 | 2.56 | −0.42 | 1.39 |
Ian Kennedy | 3.63 | 3.21 | 0.42 | 0.65 |
Corey Kluber | 2.44 | 2.35 | 0.09 | 1.05 |
Tom Koehler | 3.81 | 3.84 | −0.03 | 0.27 |
John Lackey | 3.82 | 3.78 | 0.04 | −1.05 |
Mike Leake | 3.70 | 3.88 | −0.18 | 0.00 |
Colby Lewis | 5.18 | 4.46 | 0.71 | −0.52 |
Lance Lynn | 2.74 | 3.35 | −0.61 | 0.29 |
Shelby Miller | 3.74 | 4.54 | −0.80 | −0.72 |
Jake Odorizzi | 4.13 | 3.75 | 0.38 | −0.78 |
Rick Porcello | 3.43 | 3.67 | −0.24 | 1.49 |
Garrett Richards | 2.61 | 2.60 | 0.01 | 1.04 |
Tyson Ross | 2.81 | 3.24 | −0.43 | 0.45 |
Jeff Samardzija | 2.99 | 3.20 | −0.21 | 1.97 |
Max Scherzer | 3.15 | 2.85 | 0.30 | −0.36 |
James Shields | 3.21 | 3.59 | −0.38 | 0.70 |
Alfredo Simon | 3.44 | 4.33 | −0.89 | 1.61 |
Julio Teheran | 2.89 | 3.49 | −0.60 | 1.15 |
Chris Tillman | 3.34 | 4.01 | −0.67 | 1.65 |
Yordano Ventura | 3.20 | 3.60 | −0.40 | 0.88 |
Edinson Volquez | 3.04 | 4.15 | −1.11 | 0.51 |
Jordan Zimmermann | 2.66 | 2.68 | −0.02 | 1.00 |
Acknowledgement
I am grateful to Sportvision and MLB Advanced Media for providing the HITf/x data which made this work possible. I am also happy to acknowledge the assistance of Qi Shi and Jason Wang in the preparation of this document.
References
Allen, D. 2009. Run Value by Pitch Location [Online]. Available: baseballanalysts.com/archives/2009/03/run_value_by_pi.php.Search in Google Scholar
Appelman, D. 2009. Pitch Type Linear Weights [Online]. Available: www.fangraphs.com/blogs/pitch-type-linear-weights.Search in Google Scholar
Arthur, R. 2014. Entropy and the Eephus [Online]. Available: www.baseball.prospectus.com/article.php?articleid=22758.Search in Google Scholar
Bonney, P. 2015. Defining the Pitch Sequencing Question [Online]. Available: www.hardballtimes.com/defining-the-pitch-sequencing-question.Search in Google Scholar
Bowman, A. and A. Azzalini. 1997. Applied Smoothing Techniques for Data Analysis. Oxford: Clarendon Press.Search in Google Scholar
Brooks, D. 2012. Yes, We Actually Classified Every Pitch [Online]. Available: www.fangraphs.com/tht/yes-we-actually-classified-every-pitch.Search in Google Scholar
Burley, C. 2004. The Importance of Strike One (and Two, and Three …), Part 2 [Online]. Available: www.hardballtimes.com/the-importance-of-strike-one-part-two.Search in Google Scholar
Chipman, H., E. George, and R. McCulloch. 2010. “BART: Bayesian Additive Regression Trees.” The Annals of Applied Statistics 4(1):266–98.10.1214/09-AOAS285Search in Google Scholar
Cronbach, L. 1951. “Coefficient Alpha and the Internal Structure of Tests.” Psychometrika 16(3):297–334.10.1007/BF02310555Search in Google Scholar
Deshpande, S. and A. Wyner. 2017. “A Hierarchical Bayesian Model of Pitch Framing.” Journal of Quantitative Analysis in Sports 13(3):95–112.10.1515/jqas-2017-0027Search in Google Scholar
Domingos, P. and M. Pazzani. 1996. “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.” Thirteenth International Conference on Machine Learning, 105–12.Search in Google Scholar
Duda, R., P. Hart, and D. Stork. 2001. Pattern Classification. New York: Wiley-Interscience.Search in Google Scholar
Efron, B. and C. Morris. 1977. “Stein’s Paradox in Statistics.” Scientific American 236(5):119–27.10.1038/scientificamerican0577-119Search in Google Scholar
Duin, R. 1976. “On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions.” IEEE Transactions on Computers C- 25(11):1175–9.10.1109/TC.1976.1674577Search in Google Scholar
Fast, M. 2010. “What the heck is PITCHf/x?” In The Hardball Times Baseball Annual, 2010, edited by J. Distelheim, B. Tsao, J. Oshan, C. Bolado, and B. Jacobs, The Hardball Times, pp. 153–8.Search in Google Scholar
Glaser, C. 2010. The Influence of Batters’ Expectations on Pitch Perception [Online]. Available: www.hardballtimes.com/tht-live/the-influence-of-batters-expectations-on-pitch-perception.Search in Google Scholar
Gray, R. 2002. “Behavior of College Baseball Players in a Virtual Batting Task.” Journal of Experimental Psychology: Human Perception and Performance 28(5):1131–48.10.1037/0096-1523.28.5.1131Search in Google Scholar
Greenhouse, J. 2010. Lidge’s Pitches [Online]. Available: baseballanalysts.com/archives/2010/05/brad_lidges_out.php.Search in Google Scholar
Guidoum, A. C. 2015. “Kernel Estimator and Bandwidth Selection for Density and its Derivatives.” The kedd package, version 1.03.Search in Google Scholar
Healey, G. 2017a. “Learning, Visualizing, and Assessing a Model for the Intrinsic Value of a Batted Ball.” IEEE Access 5:13811–22.10.1109/ACCESS.2017.2728663Search in Google Scholar
Healey, G. 2017b. “The New Moneyball: How Ballpark Sensors are Changing Baseball.” Proceedings of the IEEE 105:1999–2002.10.1109/JPROC.2017.2756740Search in Google Scholar
Healey, G. 2015. “Modeling the Probability of a Strikeout for a Batter/Pitcher Matchup.” IEEE Transactions on Knowledge and Data Engineering 27(9):2415–23.10.1109/TKDE.2015.2416735Search in Google Scholar
Healey, G. and S. Zhao. 2017. “Using PITCHf/x to Model the Dependence of Strikeout Rate on the Predictability of Pitch Sequences.” Journal of Sports Analytics 3:93–101.10.3233/JSA-170103Search in Google Scholar
Healey, G., S. Zhao, and D. Brooks. 2017. Measuring Pitcher Similarity [Online]. Available: www.baseball.prospectus.com/news/article/32199/prospectus-feature-measuring-pitcher-similarity.Search in Google Scholar
Jensen, P. 2009. Using HITf/x to Measure Skill [Online]. Available: www.hardball-times.com/using-hitf-x-to-measure-skill.Search in Google Scholar
Judge, J., H. Pavlidis, and D. Turkenkopf. 2015. Introducing Deserved Run Average DRA and all its Friends [Online]. Available: www.baseballprospectus.com/article.php?articleid=26195.Search in Google Scholar
Kindo, B., H. Wang, and E. Pena. 2016. MPBART - Multinomial Probit Bayesian Additive Regression Trees [online]. Available: https://arxiv.org/pdf/1309.7821.pdf.Search in Google Scholar
Lichtman, M. 2013. Pitch Types and the Times Through the Order Penalty [Online]. Available: www.baseball.prospectus.com/article.php?articleid=22235.Search in Google Scholar
Marchi, M. 2009. Pitch Run Value and Count [Online]. Available: www.hardballtimes.com/pitch-run-value-and-count.Search in Google Scholar
Meyer, D. 2015. Dynamic Run Value of Throwing a Strike (Instead of a Ball) [Online]. Available: www.hardballtimes.com/dynamic-run-value-of-throwing-a-strike-instead-of-a-ball.Search in Google Scholar
Mills, B. 2017a. “Policy Changes in Major League Baseball: Improved Agent Behavior and Ancillary Productivity Outcomes.” Economic Inquiry 55:1104–18.10.1111/ecin.12396Search in Google Scholar
Mills, B. 2017b. “Technological Innovations in Monitoring and Evaluation: Evidence of Performance Impacts Among Major League Baseball Umpires.” Labour Economics, 46:189–99.10.1016/j.labeco.2016.10.004Search in Google Scholar
Murphy, A. and R. Winkler. 1977. “Reliability of Subjective Probability Forecasts of Precipitation and Temperature.” Applied Statistics 26(1):41–7.10.2307/2346866Search in Google Scholar
Nathan, A. 2012. Determining Pitch Movement from PITCHf/x Data [Online]. Available: baseball.physics.illinois.edu/Movement.pdf.Search in Google Scholar
Panas, L. 2010. Beyond Batting Average. Morrisville, North Carolina: Lulu Press.Search in Google Scholar
Parzen, E. 1962. “On Estimation of a Probability Density Function and Mode.” Annals of Mathematical Statistics 33(3):1065–76.10.1214/aoms/1177704472Search in Google Scholar
Pavlidis, H. and D. Brooks. 2014. Framing and Blocking Pitches: A Regressed Probabilistic Model [Online]. Available: www.baseballprospectus.com/article.php?articleid=22934.Search in Google Scholar
Pitch Type Linear Weights [Online]. Available: www.fangraphs.com/library/pitching/ linear-weights.Search in Google Scholar
Roegele, J. 2014. The Effects of Pitch Sequencing [Online]. Available: www.hardballtimes.com/the-effects-of-pitch-sequencing.Search in Google Scholar
Roegele, J. 2016. The 2016 Strike Zone [Online]. Available: www.hardballtimes.com/the-2016-strike-zone.Search in Google Scholar
Rosenblatt, M. 1956. “Remarks on Some Nonparametric Estimates of a Density Function.” Annals of Mathematical Statistics 27(3):832–7.10.1214/aoms/1177728190Search in Google Scholar
Sheather, S. 2004. “Density Estimation.” Statistical Science 19(4):588–97.10.1214/088342304000000297Search in Google Scholar
Silver, N. 2006. “Why was Kevin Maas a Bust?” In Baseball between the numbers, edited by J. Keri, Basic Books, New York, pp. 253–71.Search in Google Scholar
Tango, T., M. Lichtman, and A. Dolphin. 2007. The Book: Playing the Percentages in Baseball. Dulles, Virgina: Potomac Books.Search in Google Scholar
Thorn, J. and P. Palmer. 1984. The Hidden Game of Baseball. New York: Doubleday and Company.Search in Google Scholar
Walsh, J. 2008. Searching for the Game’s Best Pitch [Online]. Available: www.hardballtimes.com/searching-for-the-games-best-pitch.Search in Google Scholar
Weighted on Base Average (wOBA) [Online]. Available: www.fangraphs.com/library/offense/woba/.Search in Google Scholar
Weinberg, N. 2015. The Beginner’s Guide to Understanding Park Factors [Online]. Available: www.fangraphs.com/library/the-beginners-guide-to-understanding-park-factors.Search in Google Scholar
wOBA and FIP Constants [Online]. Available: www.fangraphs.com/guts.aspx? type=cn.Search in Google Scholar
Zadrozny, B. and C. Elkan. 2002. “Transforming Classifier Scores into Accurate Multiclass Probability Estimates.” International Conference on Knowledge Discovery and Data Mining, 694–9.Search in Google Scholar
Zeller, R. and E. Carmines. 1980. Measurement in the Social Sciences: The Link Between Theory and Data. Cambridge, England: Cambridge University Press.Search in Google Scholar
©2019 Walter de Gruyter GmbH, Berlin/Boston