Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter April 8, 2020

The relative roles of skill and luck within 11 different golfer populations

  • Richard J. Rendleman Jr. ORCID logo EMAIL logo

Abstract

Drawing on the golf-related example of regression to the mean as presented by Kahneman in his best-selling book, Thinking Fast and Slow, this study shows how the regression-to-the-mean phenomenon is revealed in first- and second-round scoring in 11 different golfer populations, ranging from golfers with the highest level of skill (professional golfers on the PGA TOUR) to amateur groups of much lower skill. Using the mathematics of truncated normal distributions, the study introduces a new method for estimating the mix between variation in scoring due to differences in player skill and that due to luck. Estimates of the skill/luck mix are very close to those obtained using the regression-based methodology of Morrison and are nearly identical to those implied by fixed effects regression models where fixed player and round effects are estimated simultaneously. The study also sheds light on the “paradox of skill,” originally suggested by Gould and developed further by Mauboussin, as it relates to golf by showing that luck plays a more important role in determining player scores in higher-skilled golfer groups compared with lower-skilled groups.

Acknowledgement

The author thanks the PGA TOUR for providing the ShotLink data used in connection with this study. He is especially thankful to Philip Howard, who provided invaluable assistance with respect to the mathematical derivations shown in Appendix B and to Mark Broadie, who provided comments in the early stages of this work. The author also thanks Robert Connolly, Paul Danos, David Dicks, Bob Hansen, Jon Lewellen, Shijie Lu, Michael Mauboussin, and Kent Womack for helpful comments.

Appendix A Appendix

A.1 Course yardages

Course yardages for each of the 11 golfer populations were determined as follows.

A.1.1 PGA TOUR

The ShotLink archives provide total 18-hole yardages as reported on the scorecards for all courses played in connection with each PGA TOUR event, 2002–2014. For each year, 2002–2014, I computed the mean scorecard yardage per event and then averaged the mean yardage per year to arrive at an overall 2002–2014 yardage figure.

A.1.2 Web.com Tour

The ShotLink archives provide total 18-hole yardages as reported on the scorecards for all courses played in connection with each Web.com event, but only for years 2013 and 2014. For each of the 2 years I computed the mean yardage per event and then averaged the mean yardage per year to arrive at an overall 2013–2014 yardage figure. To account for a general increase in course yardage over the 2002–2014 period, I divided this figure by 1.00672, the ratio of mean PGA TOUR yardage over the 2013–2014 period to mean PGA TOUR yardage over the 2002–2014 period, to obtain an estimate of mean yardage for the entire 2002–2014 period.

A.1.3 PGA senior tour

The ShotLink archives provide total 18-hole yardages as reported on the scorecards for all courses played in connection with each PGA Senior Tour event, but only for years 2013 and 2014. For each of the 2 years I computed the mean yardage per event and then averaged the mean yardage per year to arrive at an overall 2013–2014 yardage figure. To account for a general increase in course yardage over the 2002–2014 period, I divided this figure by 1.00672, the ratio of mean PGA TOUR yardage over the 2013–2014 period to mean PGA TOUR yardage over the 2002–2014 period, to obtain an estimate of mean yardage for the entire 2002–2014 period.

A.1.4 USGA events

For each event, 2002–2014, the USGA provides course yardages in its archives. (See http://www.usga.org/articles/championship-archives.html, last accessed March 11, 2016. For each annual event, click on “Results,” and the courses and corresponding yardages will be shown along with tournament results.) In all USGA men’s championships, the two 18-hole qualifying rounds were conducted on two courses. For these events, I averaged the yardages for the two courses in each year and then computed an overall average of the annual yardages to arrive at a 2002–2014 mean yardage figure. With a few exceptions, all other USGA qualifying competitions were conducted on a single course (per year). In these cases, I simply averaged the annual yardage figures to arrive at an overall average for each respective USGA golfer population. The 2009 yardage figures for boys and girls were determined by averaging the yardages of two courses, Trump National Old, played by boys in round 1 and girls in round 2, and Trump National New, played by boys in round 2 and girls in round 1. The 2010 and 2011 girls championships were conducted on a single set of 18-holes distributed over two courses. For each of these 2 years, I computed the average yardage for the two courses as the yardage for the event. (No yardage figure was provided by the USGA for the 18 holes that were actually played.) Similarly, each of the 2012–2014 senior women’s competitions were conducted on a single set of 18-holes distributed over two courses. For each of these 3 years, I computed the average yardage for the two courses as the yardage for the event.

A.1.5 New Hampshire events

For each event, 2010–2014, the New Hampshire Golf Association provides course yardages in its archives but no yardage figures prior to 2010. (See http://www.nhgolfassociation.org/tournament-results-archive, last accessed March 4, 2015.) I divided the average yardage for 2010–2014 by 1.006252, the ratio of mean PGA TOUR yardage over the 2010–2014 period to mean PGA TOUR yardage over the 2002–2014 period, to obtain an estimate of mean yardage for the entire 2002–2014 period.

Appendix B Appendix

B.1 Derivation of equations (3) and (8) by Philip Howard[6]

B.1.1 Derivation of equation (3)

Define the multivariate normal distribution X=[M,X1,X2]

XN([00],[σM2000σX2σX1,X20σX1,X2σX2])

with M denoting a mean score per event drawn from a normal distribution centered on zero and X1 and X2 denoting residual scores in rounds 1 and 2, respectively. With this specification, the variances of round-1 and round-2 residual scoring are assumed to be equal. Moreover, there is no correlation between mean scores per event, M, and residual scores, X1 and X2, but round-1 and round-2 residual scores can be correlated with covariance σX1,X2.

Actual scores in rounds 1 and 2 are defined as follows:

S1=M+X1S2=M+X2.

Here, S=[S1,S2] has a bivariate normal distribution

SN([0],[σS2σCovσCovσS2])σS2=σM2+σX2σCov=σM2+σX1,X2ρS=σCovσS2

The conditional expectation of S2 given S1 = s1 is

E[S2|S1=s1]=ρSs1=σCovσS2E[S1|S1=s1].

Thus the conditional expectation of S2 given S1 > s1 is

E[S2|S1>s1]=σCovσS2E[S1|S1>s1]=σCovσSϕ(s1σS)1Φ(s1σS).

Finally, the conditional expectation of S2 given S1>0 is

E[S2|S1>0]=22πσCovσS=22πσM2+σX1,X2σS.

B.1.2 Derivation of equation (8)

Maintaining the same assumptions, and for clarity, expressing the variance of first-round scores as σS12, the conditional expectation of S2 given S1 = s1 is:

E[S2|S1=s1]=ρSs1=σCovσS12E[S1|S1=s1].

This implies a slope coefficient βS2,S1 of

βS2,S1=σCovσS12=σM2+σS1,S2σS12.

Rearranging in terms of σM,

σM=βS2,S1σS12+σX1,X2.

References

Broadie, M. and S. Ko. 2014. “A Golf Simulation Model and Analysis of Golf Skill on Golf Scores.” Working paper, Columbia University, January 2.Search in Google Scholar

Broadie, M. and R. J. Rendleman, Jr. 2013. “Are the Official World Golf Rankings Biased?” Journal of Quantitative Analysis in Sports 9:127–140.10.1515/jqas-2012-0013Search in Google Scholar

Connolly, R. and R. J. Rendleman, Jr. 2008. “Skill, Luck and Streaky Play on the Pga Tour.” Journal of the American Statistical Association 103:74–88.10.1198/016214507000000310Search in Google Scholar

Gould, S. J. 2003. Triumph and Tragedy in Mudville. New York, NY, USA: W.W. Norton & Company.Search in Google Scholar

Greene, W. H. 2003. Econometric Analysis. Upper Saddle River, NJ, USA: Prentice Hall.Search in Google Scholar

Kahneman, D. 2011. Thinking Fast and Slow. New York, NY, USA: Straus and Giroux.Search in Google Scholar

Mauboussin, M. 2012. The Success Equation. Boston, MA, USA: Harvard Business School Publishing.Search in Google Scholar

Morrison, D. 1973. “Reliability of Tests: A Technique for using the ‘Regression to the Mean’ Fallacy.” Journal of Marketing Research 20:91–93.Search in Google Scholar

Smith, G. and J. Smith. 2005. “Regression to the Mean in Average Test Scores.” Educational Assessment 10:377–399.10.1207/s15326977ea1004_4Search in Google Scholar

Stigler, S. M. 1997. “Regression towards the Mean, Historically Considered.” Statistical Methods in Medical Research 6:103–114.10.1177/096228029700600202Search in Google Scholar PubMed

Storey, J. D. 2002. “A Direct Approach to False Discovery Rates.” Journal of the Royal Statistical Society B 64:479–498.10.1111/1467-9868.00346Search in Google Scholar

Wang, Y. 1998. “Smoothing Spline Models with Correlated Random Errors.” The Journal of The American Statistical Association 93:341–348.10.1080/01621459.1998.10474115Search in Google Scholar

Published Online: 2020-04-08
Published in Print: 2020-09-25

©2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.1515/jqas-2019-0028/html
Scroll to top button