A Bayesian method for computing intrinsic pitch values using kernel density and nonparametric regression estimates

Glenn Healey

doi:10.1515/jqas-2017-0058

Published by De Gruyter December 20, 2018

A Bayesian method for computing intrinsic pitch values using kernel density and nonparametric regression estimates

Glenn Healey

From the journal Journal of Quantitative Analysis in Sports

https://doi.org/10.1515/jqas-2017-0058

Showing a limited preview of this publication:

Abstract

The deployment of sensors that characterize the trajectory of pitches and batted balls in three dimensions provides the opportunity to assign an intrinsic value to a pitch that depends on its physical properties and not on its observed outcome. We exploit this opportunity by using a Bayesian framework to learn a set of mappings from five-dimensional velocity, movement, and location vectors to intrinsic pitch values. A kernel method generates nonparametric estimates for the component probability density functions in Bayes theorem while nonparametric regression is used to derive a batted ball weight function that is invariant to the defense, ballpark, and atmospheric conditions. Cross-validation is used to determine the parameters of the model. We use Cronbach’s alpha to show that intrinsic pitch values have a significantly higher reliability than outcome-based pitch values. We also develop a method to combine intrinsic values at the individual pitch level into a statistic that captures the value of a pitcher’s collection of pitches over a period of time. We use this statistic to show that pitchers who outperform their intrinsic values during a season tend to perform worse the following year. We also show that this statistic provides better predictive value for future Earned Run Average (ERA) than either current ERA or Fielding Independent Pitching (FIP).

Keywords: baseball; Bayesian; kernel density estimates; machine learning; pitch value; reliability

Appendix I: strike zone transformation

For each pitch recorded by the PITCHf/x system the height of the top and bottom of the batter’s strike zone is specified manually. Since these specifications are error-prone and can vary over time, we use the average of the specified top and bottom for each individual batter over the full season to represent his strike zone in the vertical dimension. For a given season, let B_t and B_b denote the average height of the top and bottom of the strike zone specified for batter B and let L_t and L_b denote the average height of the top and bottom of the strike zone for the league. For the 2014 data set, we used L_t = 3.403 feet and L_b = 1.571 feet. For a pitch with a measured vertical height of z=l′z to batter B, the normalized z-coordinate l_z is computed as

lz={Lb−(Bb−z)z≤BbLb+(z−Bb)(Lt−Lb)(Bt−Bb)Bb≤z≤BtLt+(z−Bt)z≥Bt

Thus, a pitch at the bottom of any batter’s strike zone (z = B_b) maps to l_z = L_b and a pitch at the top of any batter’s strike zone (z = B_t) maps to l_z = L_t.

Appendix II: distribution of pitches across counts and outcomes

Table 4 presents the distribution of pitches thrown in the RHP vs. RHB configuration in 2014 for each count and with each outcome. As described in Section 2.2 the outcomes are R₀ = ball in play, R₁ = called ball, R₂ = called strike, R₃ = swinging strike, R₄ = foul ball, and R₅ = batter hit-by-pitch where foul tips that are caught for strikeouts are classified as R₃ and not R₄.

Table 4:

Number of pitches for each count and with each outcome, RHP vs. RHB, 2014.

Count	R₀	R₁	R₂	R₃	R₄	R₅
0–0	7090	25,233	22,682	4683	6994	151
1–0	4315	7920	6023	2484	4145	41
2–0	1468	2296	2064	639	1315	14
3–0	118	656	1328	40	117	1
0–1	6250	13,552	4054	4215	6079	125
1–1	5581	8481	3085	3317	5390	78
2–1	3102	3417	1491	1494	2825	39
3–1	1208	1235	808	446	1125	11
0–2	3141	7817	739	2438	3394	70
1–2	5265	9057	1106	3705	5464	98
2–2	5141	5517	964	2763	5153	82
3–2	3485	2260	582	21	3243	38

Appendix III: optimal bandwidths for kernel density estimates

In Tables 5–10, we present the optimal bandwidths derived for different configurations using the process described in Section 2.5.

Table 5:

Optimal bandwidths for each outcome, RHP vs. RHB, 0–0 count.

Outcome R_j	σs∗(j)	σbx∗(j)	σbz∗(j)	σlx∗(j)	σlz∗(j)
ball in play	1.45	1.30	1.30	0.215	0.255
called ball	1.55	1.30	1.35	0.260	0.290
called strike	1.45	1.20	1.20	0.180	0.185
swinging strike	1.55	1.30	1.40	0.280	0.345
foul ball	1.45	1.30	1.20	0.255	0.285
hit-by-pitch	2.50	1.60	2.45	0.365	0.360

Table 6:

Optimal bandwidths for each outcome, RHP vs. RHB, 0–1 count.

Outcome R_j	σs∗(j)	σbx∗(j)	σbz∗(j)	σlx∗(j)	σlz∗(j)
ball in play	1.55	1.25	1.30	0.255	0.280
called ball	1.60	1.40	1.40	0.295	0.375
called strike	1.75	1.55	1.55	0.190	0.225
swinging strike	1.75	1.40	1.40	0.280	0.345
foul ball	1.55	1.25	1.40	0.260	0.300
hit-by-pitch	2.45	2.65	2.00	0.285	0.350

Table 7:

Optimal bandwidths for each outcome, RHP vs. RHB, 1–0 count.

Outcome R_j	σs∗(j)	σbx∗(j)	σbz∗(j)	σlx∗(j)	σlz∗(j)
ball in play	1.55	1.25	1.25	0.240	0.270
called ball	1.80	1.45	1.50	0.280	0.335
called strike	1.65	1.50	1.45	0.220	0.235
swinging strike	1.65	1.40	1.45	0.330	0.360
foul ball	1.40	1.45	1.35	0.275	0.285
hit-by-pitch	2.15	2.70	2.00	0.315	0.255

Table 8:

Optimal bandwidths for each outcome, RHP vs. LHB, 0–0 count.

Outcome R_j	σs∗(j)	σbx∗(j)	σbz∗(j)	σlx∗(j)	σlz∗(j)
ball in play	1.60	1.25	1.30	0.220	0.260
called ball	1.55	1.30	1.35	0.250	0.300
called strike	1.50	1.25	1.25	0.170	0.190
swinging strike	1.85	1.50	1.40	0.290	0.315
foul ball	1.50	1.30	1.35	0.240	0.285
hit-by-pitch	2.85	2.95	2.75	0.520	0.465

Table 9:

Optimal bandwidths for each outcome, RHP vs. LHB, 0–1 count.

Outcome R_j	σs∗(j)	σbx∗(j)	σbz∗(j)	σlx∗(j)	σlz∗(j)
ball in play	1.35	1.30	1.55	0.245	0.290
called ball	1.75	1.45	1.50	0.270	0.355
called strike	1.95	1.55	1.70	0.175	0.240
swinging strike	1.75	1.45	1.50	0.305	0.360
foul ball	1.45	1.35	1.35	0.275	0.300
hit-by-pitch	2.80	2.65	2.90	0.260	1.035

Table 10:

Optimal bandwidths for each outcome, RHP vs. LHB, 1–0 count.

Outcome R_j	σs∗(j)	σbx∗(j)	σbz∗(j)	σlx∗(j)	σlz∗(j)
ball in play	1.60	1.35	1.25	0.260	0.280
called ball	1.75	1.50	1.50	0.275	0.325
called strike	1.60	1.40	1.55	0.195	0.235
swinging strike	1.85	1.45	1.55	0.300	0.370
foul ball	1.55	1.35	1.35	0.260	0.290
hit-by-pitch	1.55	1.80	1.55	0.455	0.795

Appendix IV: data used to evaluate intrinsic pitch statistics

In Tables 11 and 12, we present the data for the 34 pitchers who were used to evaluate the intrinsic pitch statistics as described in Section 4.

Table 11:

RHP with at least 162 innings pitched in 2014 and 2015.

Pitcher	2014	2014	2015	ERA
	OMI ⋅ 1000	ERA	ERA	Difference
Chris Archer	−11.1	3.33	3.23	−0.10
A.J. Burnett	8.8	4.59	3.18	−1.41
Bartolo Colon	8.0	4.09	4.16	0.07
Johnny Cueto	−6.1	2.25	3.44	1.19
R.A. Dickey	4.7	3.71	3.91	0.20
Yovani Gallardo	16.1	3.51	3.42	−0.09
Kyle Gibson	7.0	4.47	3.84	−0.63
Sonny Gray	−10.8	3.08	2.73	−0.35
Zack Greinke	10.7	2.71	1.66	−1.05
Jason Hammel	8.0	3.47	3.74	0.27
Aaron Harang	6.3	3.57	4.86	1.29
Dan Haren	13.2	4.02	3.60	−0.42
Felix Hernandez	−13.3	2.14	3.53	1.39
Ian Kennedy	3.9	3.63	4.28	0.65
Corey Kluber	3.5	2.44	3.49	1.05
Tom Koehler	−2.8	3.81	4.08	0.27
John Lackey	17.3	3.82	2.77	−1.05
Mike Leake	16.3	3.70	3.70	0.00
Colby Lewis	25.1	5.18	4.66	−0.52
Lance Lynn	−4.9	2.74	3.03	0.29
Shelby Miller	7.9	3.74	3.02	−0.72
Jake Odorizzi	−1.3	4.13	3.35	−0.78
Rick Porcello	0.5	3.43	4.92	1.49
Garrett Richards	−7.3	2.61	3.65	1.04
Tyson Ross	−5.8	2.81	3.26	0.45
Jeff Samardzija	6.1	2.99	4.96	1.97
Max Scherzer	2.1	3.15	2.79	−0.36
James Shields	−0.2	3.21	3.91	0.70
Alfredo Simon	−3.0	3.44	5.05	1.61
Julio Teheran	−5.4	2.89	4.04	1.15
Chris Tillman	3.2	3.34	4.99	1.65
Yordano Ventura	−8.4	3.20	4.08	0.88
Edinson Volquez	−13.3	3.04	3.55	0.51
Jordan Zimmermann	−4.9	2.66	3.66	1.00

Table 12:

RHP with at least 162 innings pitched in 2014 and 2015.

Pitcher	2014	2014	2014	ERA
	ERA	FIP	ERA – FIP	Difference
Chris Archer	3.33	3.39	−0.06	−0.10
A.J. Burnett	4.59	4.14	0.45	−1.41
Bartolo Colon	4.09	3.57	0.52	0.07
Johnny Cueto	2.25	3.30	−1.05	1.19
R.A. Dickey	3.71	4.32	−0.61	0.20
Yovani Gallardo	3.51	3.94	−0.43	−0.09
Kyle Gibson	4.47	3.80	0.67	−0.63
Sonny Gray	3.08	3.46	−0.38	−0.35
Zack Greinke	2.71	2.97	−0.26	−1.05
Jason Hammel	3.47	3.92	−0.45	0.27
Aaron Harang	3.57	3.57	0.00	1.29
Dan Haren	4.02	4.09	−0.07	−0.42
Felix Hernandez	2.14	2.56	−0.42	1.39
Ian Kennedy	3.63	3.21	0.42	0.65
Corey Kluber	2.44	2.35	0.09	1.05
Tom Koehler	3.81	3.84	−0.03	0.27
John Lackey	3.82	3.78	0.04	−1.05
Mike Leake	3.70	3.88	−0.18	0.00
Colby Lewis	5.18	4.46	0.71	−0.52
Lance Lynn	2.74	3.35	−0.61	0.29
Shelby Miller	3.74	4.54	−0.80	−0.72
Jake Odorizzi	4.13	3.75	0.38	−0.78
Rick Porcello	3.43	3.67	−0.24	1.49
Garrett Richards	2.61	2.60	0.01	1.04
Tyson Ross	2.81	3.24	−0.43	0.45
Jeff Samardzija	2.99	3.20	−0.21	1.97
Max Scherzer	3.15	2.85	0.30	−0.36
James Shields	3.21	3.59	−0.38	0.70
Alfredo Simon	3.44	4.33	−0.89	1.61
Julio Teheran	2.89	3.49	−0.60	1.15
Chris Tillman	3.34	4.01	−0.67	1.65
Yordano Ventura	3.20	3.60	−0.40	0.88
Edinson Volquez	3.04	4.15	−1.11	0.51
Jordan Zimmermann	2.66	2.68	−0.02	1.00

Acknowledgement

I am grateful to Sportvision and MLB Advanced Media for providing the HITf/x data which made this work possible. I am also happy to acknowledge the assistance of Qi Shi and Jason Wang in the preparation of this document.

References

Allen, D. 2009. Run Value by Pitch Location [Online]. Available: baseballanalysts.com/archives/2009/03/run_value_by_pi.php.Search in Google Scholar

Appelman, D. 2009. Pitch Type Linear Weights [Online]. Available: www.fangraphs.com/blogs/pitch-type-linear-weights.Search in Google Scholar

Arthur, R. 2014. Entropy and the Eephus [Online]. Available: www.baseball.prospectus.com/article.php?articleid=22758.Search in Google Scholar

Bonney, P. 2015. Defining the Pitch Sequencing Question [Online]. Available: www.hardballtimes.com/defining-the-pitch-sequencing-question.Search in Google Scholar

Bowman, A. and A. Azzalini. 1997. Applied Smoothing Techniques for Data Analysis. Oxford: Clarendon Press.Search in Google Scholar

Brooks, D. 2012. Yes, We Actually Classified Every Pitch [Online]. Available: www.fangraphs.com/tht/yes-we-actually-classified-every-pitch.Search in Google Scholar

Burley, C. 2004. The Importance of Strike One (and Two, and Three …), Part 2 [Online]. Available: www.hardballtimes.com/the-importance-of-strike-one-part-two.Search in Google Scholar

Chipman, H., E. George, and R. McCulloch. 2010. “BART: Bayesian Additive Regression Trees.” The Annals of Applied Statistics 4(1):266–98.10.1214/09-AOAS285Search in Google Scholar

Cronbach, L. 1951. “Coefficient Alpha and the Internal Structure of Tests.” Psychometrika 16(3):297–334.10.1007/BF02310555Search in Google Scholar

Deshpande, S. and A. Wyner. 2017. “A Hierarchical Bayesian Model of Pitch Framing.” Journal of Quantitative Analysis in Sports 13(3):95–112.10.1515/jqas-2017-0027Search in Google Scholar

Domingos, P. and M. Pazzani. 1996. “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.” Thirteenth International Conference on Machine Learning, 105–12.Search in Google Scholar

Duda, R., P. Hart, and D. Stork. 2001. Pattern Classification. New York: Wiley-Interscience.Search in Google Scholar

Efron, B. and C. Morris. 1977. “Stein’s Paradox in Statistics.” Scientific American 236(5):119–27.10.1038/scientificamerican0577-119Search in Google Scholar

Duin, R. 1976. “On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions.” IEEE Transactions on Computers C- 25(11):1175–9.10.1109/TC.1976.1674577Search in Google Scholar

Fast, M. 2010. “What the heck is PITCHf/x?” In The Hardball Times Baseball Annual, 2010, edited by J. Distelheim, B. Tsao, J. Oshan, C. Bolado, and B. Jacobs, The Hardball Times, pp. 153–8.Search in Google Scholar

Glaser, C. 2010. The Influence of Batters’ Expectations on Pitch Perception [Online]. Available: www.hardballtimes.com/tht-live/the-influence-of-batters-expectations-on-pitch-perception.Search in Google Scholar

Gray, R. 2002. “Behavior of College Baseball Players in a Virtual Batting Task.” Journal of Experimental Psychology: Human Perception and Performance 28(5):1131–48.10.1037/0096-1523.28.5.1131Search in Google Scholar

Greenhouse, J. 2010. Lidge’s Pitches [Online]. Available: baseballanalysts.com/archives/2010/05/brad_lidges_out.php.Search in Google Scholar

Guidoum, A. C. 2015. “Kernel Estimator and Bandwidth Selection for Density and its Derivatives.” The kedd package, version 1.03.Search in Google Scholar

Healey, G. 2017a. “Learning, Visualizing, and Assessing a Model for the Intrinsic Value of a Batted Ball.” IEEE Access 5:13811–22.10.1109/ACCESS.2017.2728663Search in Google Scholar

Healey, G. 2017b. “The New Moneyball: How Ballpark Sensors are Changing Baseball.” Proceedings of the IEEE 105:1999–2002.10.1109/JPROC.2017.2756740Search in Google Scholar

Healey, G. 2015. “Modeling the Probability of a Strikeout for a Batter/Pitcher Matchup.” IEEE Transactions on Knowledge and Data Engineering 27(9):2415–23.10.1109/TKDE.2015.2416735Search in Google Scholar

Healey, G. and S. Zhao. 2017. “Using PITCHf/x to Model the Dependence of Strikeout Rate on the Predictability of Pitch Sequences.” Journal of Sports Analytics 3:93–101.10.3233/JSA-170103Search in Google Scholar

Healey, G., S. Zhao, and D. Brooks. 2017. Measuring Pitcher Similarity [Online]. Available: www.baseball.prospectus.com/news/article/32199/prospectus-feature-measuring-pitcher-similarity.Search in Google Scholar

Jensen, P. 2009. Using HITf/x to Measure Skill [Online]. Available: www.hardball-times.com/using-hitf-x-to-measure-skill.Search in Google Scholar

Judge, J., H. Pavlidis, and D. Turkenkopf. 2015. Introducing Deserved Run Average DRA and all its Friends [Online]. Available: www.baseballprospectus.com/article.php?articleid=26195.Search in Google Scholar

Kindo, B., H. Wang, and E. Pena. 2016. MPBART - Multinomial Probit Bayesian Additive Regression Trees [online]. Available: https://arxiv.org/pdf/1309.7821.pdf.Search in Google Scholar

Lichtman, M. 2013. Pitch Types and the Times Through the Order Penalty [Online]. Available: www.baseball.prospectus.com/article.php?articleid=22235.Search in Google Scholar

Marchi, M. 2009. Pitch Run Value and Count [Online]. Available: www.hardballtimes.com/pitch-run-value-and-count.Search in Google Scholar

Meyer, D. 2015. Dynamic Run Value of Throwing a Strike (Instead of a Ball) [Online]. Available: www.hardballtimes.com/dynamic-run-value-of-throwing-a-strike-instead-of-a-ball.Search in Google Scholar

Mills, B. 2017a. “Policy Changes in Major League Baseball: Improved Agent Behavior and Ancillary Productivity Outcomes.” Economic Inquiry 55:1104–18.10.1111/ecin.12396Search in Google Scholar

Mills, B. 2017b. “Technological Innovations in Monitoring and Evaluation: Evidence of Performance Impacts Among Major League Baseball Umpires.” Labour Economics, 46:189–99.10.1016/j.labeco.2016.10.004Search in Google Scholar

Murphy, A. and R. Winkler. 1977. “Reliability of Subjective Probability Forecasts of Precipitation and Temperature.” Applied Statistics 26(1):41–7.10.2307/2346866Search in Google Scholar

Nathan, A. 2012. Determining Pitch Movement from PITCHf/x Data [Online]. Available: baseball.physics.illinois.edu/Movement.pdf.Search in Google Scholar

Panas, L. 2010. Beyond Batting Average. Morrisville, North Carolina: Lulu Press.Search in Google Scholar

Parzen, E. 1962. “On Estimation of a Probability Density Function and Mode.” Annals of Mathematical Statistics 33(3):1065–76.10.1214/aoms/1177704472Search in Google Scholar

Pavlidis, H. and D. Brooks. 2014. Framing and Blocking Pitches: A Regressed Probabilistic Model [Online]. Available: www.baseballprospectus.com/article.php?articleid=22934.Search in Google Scholar

Pitch Type Linear Weights [Online]. Available: www.fangraphs.com/library/pitching/ linear-weights.Search in Google Scholar

Roegele, J. 2014. The Effects of Pitch Sequencing [Online]. Available: www.hardballtimes.com/the-effects-of-pitch-sequencing.Search in Google Scholar

Roegele, J. 2016. The 2016 Strike Zone [Online]. Available: www.hardballtimes.com/the-2016-strike-zone.Search in Google Scholar

Rosenblatt, M. 1956. “Remarks on Some Nonparametric Estimates of a Density Function.” Annals of Mathematical Statistics 27(3):832–7.10.1214/aoms/1177728190Search in Google Scholar

Sheather, S. 2004. “Density Estimation.” Statistical Science 19(4):588–97.10.1214/088342304000000297Search in Google Scholar

Silver, N. 2006. “Why was Kevin Maas a Bust?” In Baseball between the numbers, edited by J. Keri, Basic Books, New York, pp. 253–71.Search in Google Scholar

Tango, T., M. Lichtman, and A. Dolphin. 2007. The Book: Playing the Percentages in Baseball. Dulles, Virgina: Potomac Books.Search in Google Scholar

Thorn, J. and P. Palmer. 1984. The Hidden Game of Baseball. New York: Doubleday and Company.Search in Google Scholar

Walsh, J. 2008. Searching for the Game’s Best Pitch [Online]. Available: www.hardballtimes.com/searching-for-the-games-best-pitch.Search in Google Scholar

Weighted on Base Average (wOBA) [Online]. Available: www.fangraphs.com/library/offense/woba/.Search in Google Scholar

Weinberg, N. 2015. The Beginner’s Guide to Understanding Park Factors [Online]. Available: www.fangraphs.com/library/the-beginners-guide-to-understanding-park-factors.Search in Google Scholar

wOBA and FIP Constants [Online]. Available: www.fangraphs.com/guts.aspx? type=cn.Search in Google Scholar

Zadrozny, B. and C. Elkan. 2002. “Transforming Classifier Scores into Accurate Multiclass Probability Estimates.” International Conference on Knowledge Discovery and Data Mining, 694–9.Search in Google Scholar

Zeller, R. and E. Carmines. 1980. Measurement in the Social Sciences: The Link Between Theory and Data. Cambridge, England: Cambridge University Press.Search in Google Scholar

Published Online: 2018-12-20

Published in Print: 2019-02-25

A Bayesian method for computing intrinsic pitch values using kernel density and nonparametric regression estimates

Abstract

Appendix I: strike zone transformation

Appendix II: distribution of pitches across counts and outcomes

Appendix III: optimal bandwidths for kernel density estimates

Appendix IV: data used to evaluate intrinsic pitch statistics

Acknowledgement

References

Journal and Issue

Articles in the same Issue