Bayesian statistics meets sports: a comprehensive review

Edgar Santos-Fernandez; Paul Wu; Kerrie L. Mengersen

doi:10.1515/jqas-2018-0106

Publicly Available Published by De Gruyter June 27, 2019

Bayesian statistics meets sports: a comprehensive review

Edgar Santos-Fernandez , Paul Wu and Kerrie L. Mengersen

From the journal Journal of Quantitative Analysis in Sports

https://doi.org/10.1515/jqas-2018-0106

Abstract

Bayesian methods are becoming increasingly popular in sports analytics. Identified advantages of the Bayesian approach include the ability to model complex problems, obtain probabilistic estimates and predictions that account for uncertainty, combine information sources and update learning as new data become available. The volume and variety of data produced in sports activities over recent years and the availability of software packages for Bayesian computation have contributed significantly to this growth. This comprehensive survey reviews and characterizes the latest advances in Bayesian statistics in sports, including methods and applications. We found that a large proportion of these articles focus on modeling/predicting the outcome of sports games and on the development of statistics that provides a better picture of athletes’ performance. We provide a description of some of the advances in basketball, football and baseball. We also summarise the sources of data used for the analysis and the most commonly used software for Bayesian computation. We found a similar number of publications between 2013 and 2018 as compared to those published in the three previous decades, which is an indication of the growing adoption rate of Bayesian methods in sports.

Keywords: Bayesian modelling; Bayesian regression; sports science; sports statistics

1 Introduction

Statistical techniques generally fall within the “Bayesian” category when they rely on the Bayes theorem, treat unknown parameters probabilistically and give a subjective treatment to probabilities (Bernardo and Smith 2009). Bayesian statistics has been rapidly gaining traction in sports science in recent years. Due to the recent and large number of Bayesian articles in sports literature, we are motivated to review some of the most commonly used techniques and methods. The main questions we addressed are: (1) what are the main developments?, (2) what are the most popular techniques?, (3) in what sports? and (4) what are the main challenges? However, the purpose of the article is not to establish a direct comparison between frequentist and Bayesian methods.

A wide range of Bayesian techniques can be found in the sports literature. For instance, Bayesian hierarchical models (e.g. Reich et al. 2006; Albert 2008; Baio and Blangiardo 2010; Miller et al. 2014), Bayesian regression (BR) (Jensen, Shirley, and Wyner 2009b; Albert 2016; Deshpande and Jensen 2016; Silva and Swartz 2016; Boys and Philipson 2018), spatial and spatio-temporal analysis (Jensen et al. 2009b; Yousefi and Swartz 2013; Miller et al. 2014), Hidden Markov Models (HMM) (Franks et al. 2015), etc.

Modern sports science is both characterized and challenged by the volume and variety of available data. Good examples of this are the basketball STATS SportVU tracking technology, the MLB baseball PITCHf/x and the golf ShotLink system. While traditional statistical analyses focused on points scored, averages and number of goals, recent advances in sports analytics consider more complex issues such as the interaction of the players in offensive and defensive actions. See, for example, Gudmundsson and Horton (2017).

There are several theoretical and computational advantages for choosing Bayesian techniques for modelling (Bernardo and Smith 2009; Berger 2013). More specifically in the context of sports, more and more scientist are going Bayesian because these methods allow to:

incorporate expert information or prior believes,
use Bayesian learning where the current posterior distribution becomes the prior for future data,
provide probabilistic rather than point estimates,
obtain posterior distributions for the parameters of interest,
include latent variables,
model complex problems,
integrate and combine efficiently data that comes from different sources,
update regularly the model when new data becomes available,
treat effectively missing data,
deal more effectively with small dataset using prior information to improve the parameter estimates,
use of non-standard distributions,
obtain probabilistic rankings of players or teams using the MCMC chains,
make predictions taking into consideration uncertainty,
capture spatial neighbouring information using prior distributions and incorporate spatial dependency.

The rest of the article has been organised as follows. The next section discusses the method we adopted to carry out the review. Next, in the results section, we examine several of these Bayesian techniques and then we discuss the main developments undergone by relevant sports.

2 Materials and methods for the comprehensive review process

The literature review was conducted according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) (Liberati et al. 2009) guidelines, with the aim of reducing the publication bias as much as possible. We searched in Google Scholar, Scopus and PubMed databases using the keyword: “sport*” along with: “Bayesian regression,” “Bayesian statistics,” “Gibbs sampler,” “Bayesian Hierarchical model,” “Empirical Bayes methods,” “Hidden Markov model,” “Markov chain Monte Carlo” or “MCMC,” “posterior” and “prior distribution,” “spatial analysis” and “spatio-temporal modeling.” Other related techniques were not included because they are not fully Bayesian i.e. they do not give a subjective treatment to probabilities. For example, naïve Bayes is mentioned across sports papers, and despite they use the Bayes rule, no subjective interpretation of probability is given (Hand and Yu 2001) and therefore they are not considered fully Bayesian. Similarly, Empirical Bayes (EBA) does not classify as fully Bayesian because the prior distribution is generally obtained from observed data. However, EBA methods were included in this review since they have enjoyed great popularity among practitioners in sports statistics for decades.

The search started on 05-Jan-2018 and ended on 31-Aug-2018. We focused on relevant English language journal peer-reviewed articles and books published from 1985 using Bayesian statistical methods for modelling and analysis of team and individual sports, including basketball, baseball, football, marathon, swimming, triathlon, etc. We also reviewed papers about relevant issues or technologies concerning these sports, including wearable technologies and doping. However, gambling, computer vision and video analytics are beyond the scope of this review.

We focused on (1) the statistical method, (2) the most relevant findings and the conclusions, (3) the area of application and type of sport, (4) the data sources including the season or competition and country and (5) the software they used for the analysis (if mentioned). Articles’ metadata (e.g. authors’ affiliation) was extracted from the PDF documents using 𝖱 (R Core Team 2017).

3 Results

In total n = 96 articles were initially identified from the database search while 31 were found through the review process or identified by the authors in previous research. A total of n = 42 articles were excluded because they fell out of topic or were non-Bayesian. Figure 1 depicts the review process.

Figure 1:

Flow chart of the comprehensive review process based on the PRISMA (Liberati et al. 2009) methodology.

The authors of these publications were from the United States (36%), Australia (9%), the United Kingdom (8%), Canada (8%), Sweden (8%), Switzerland (7%), Brazil (4%), Germany (3%), the Netherlands (3%), Japan (3%) and Hong Kong (3%).

3.1 Bayesian statistical methods

Bayesian statistical models are based on the Bayes theorem. The posterior distribution for the parameter of interest θ is obtained using:

(1)f(θ|z)=f(z|θ)f(θ)∫f(z|θ)f(θ)dθ

where f(θ) and f(z|θ) are the prior distribution and likelihood, respectively.

The following subsections describe papers grouped by technique which includes Bayesian regression models and methods accounting for space and time.

3.1.1 Bayesian regression (BR)

Bayesian linear regression is the most common model of choice when assessing the association between a response variable y and p predictors x=(x1,x2,⋯,xp). In the simplest formulation,

(2)yi=β1xi1+β2xi2+⋯+βkxik+εi

where i = 1, 2, ⋯ , n is the observation number and ε is the residual that is assumed to be normally distributed with zero mean and constant variance σ². Priors are then placed on the parameter vector β and σ². See Gelman et al. (2014) for more details. Some examples of the use of this modeling paradigm are given below.

A Bayesian regression approach for small treatment effects, which are commonly encountered in sports performance studies, was proposed by Mengersen et al. (2016) as an alternative to the traditional magnitude-based inference suggested by Batterham and Hopkins (2006). These authors addressed the effect of altitude training regimens on the running performance and blood parameters (hemoglobin mass, maximum blood lactate concentration) in triathlon. They considered G =3 treatments: live high-train low (LHTL), intermittent hypoxic exposure (IHE) and placebo, and 8 participants per group. Another predictor in the model (X) was the change (in %) in training load before and after for each participant.

Letting I₁ and I₂ be indicator values for treatments 1 and 2, respectively, this model can be described in a similar manner to Eq. (2) or in an equivalent hierarchical manner as:

(3)yij∼𝒩(μij,σj2),μij=β0+β1X+∑j=12βj+1Ij

where the i-th observation reside within groups (j), with each group having a potentially different variance.

In another example, Deshpande and Jensen (2016) estimated basketball players’ contributions to their team winning probabilities using a high dimensional regression. Let y_i be the win probability of the home team in the i^th shift, where the shifts are the periods between substitutions. They used the following regression equation:

(4)yi=μ+θhi1⋯+θhi5−θai1⋯−θai5+τHi−τAi+σεi

where μ is the home court advantage. The subscripts h and a represent the home and away teams, respectively. The θ’s are the player’s effect and hence θhi1 and θai1 are the effects from player 1 in the home team and the away team, respectively. For each of the 488 players on the league, they obtained θ estimates. The parameters τHi and τAi are the partial effect associated with the home and away teams. σ denotes a measure of the variability. The marginal posterior densities for θ provide a good picture of the player contribution to winning.

Often, the response y_i is a binary variable following a Bernoulli distribution, for instance whether the player scored or not. This case is frequently approached using a logistic regression model, which can be defined as follows:

(5)yi∼Bern(pi),

(6)logit(pi)=log(pi/(1−pi))=β1xi1+β2xi2+⋯+βkxik+εi.

Here, ε_i represents extra-Bernoulli variation; see Besag et al. (1995) for details.

Deshpande and Wyner (2017) approached the issue of baseball pitch “framing” from a Bayesian hierarchical perspective using logistic regression. The term “framing” refers to an action often carried out by catchers making a pitch look more like a strike. In order to assess the impact of the catcher in the decision, they estimated the probability that a pitch is called a strike by a given umpire and other covariates such as the pitch location (x, z), the pitcher, etc.

They employed the following logistic model:

(7)logit(pi)=Θbu,B+Θcau,CA+Θpu,P+Θcou,CO+fu(x,z)

where b, ca, co, p and u are the partial effects of the batter, catcher, count, pitcher, and the umpire. The pitch location is given by fu(x,z). Other factors such as the type of pitch (fastball, curveball, etc.) and the speed could also have been included in the model in a straightforward manner. In other examples, Miskin, Fellingham, and Florence (2010) used logistic regression to assess the importance of several skills in volleyball and Cafarelli, Rigdon, and Rigdon (2012) to obtain the probability of converting a third down in National Football League (NFL) based on the number yards to go.

Another useful model for binary response variables in the literature is the probit regression, which is based on the probit link function:

(8)pi=Φ(β1xi1+β2xi2+⋯+βkxik+ε)

where Φ is the standard normal cumulative distribution function.

Probit regression was used by Jensen et al. (2009b) to construct a baseball defensive model and predict the catching probability given the location of the defender in the field, the velocity of the ball and the direction. They defined their model as:

(9)pij=Φ(βi0+βi1Dij+βi2DijFij+βi3DijVij+βi4DijVijFij+ε)

This model gives the probability of the ball j being caught by the player i. Here, D_ij represents the distance travelled by the player and V_ij is the velocity. The variable F_ij is 1 when moving forward and 0 otherwise. Note that this model considers the interaction between predictors and includes a categorical variable. It also allows for the computation of the player’s contribution to the defence in terms of runs saved, compared to the rest of the players in the same position.

The multinomial logistic regression in McFadden (1973) is a generalized method for cases where a categorical response variable takes more than two values, for example, the possible outcomes of a shot in basketball y={0,1,2,3}. Reich et al. (2006) used this model to assess the relationship between predictors such as the defensive strength of the opposition team and playing home or away, first or second half, and the response variables: (1) location when shooting, (2) the shooting frequency and (3) the efficiency. They let y_i be a region in the court for a given shot i that follows a multinomial distribution with parameter θ(η), and they defined a predictor

(10)ηi=log(A)+xiβ

where A is a vector of the area of the section j, since some of the sections have different areas. The model is then defined as:

(11)θj(ηi)=exp(log(Aj)+xi′β⋅j)∑l=1pexp(log(Al)+xi′β⋅l).

See also Glickman and Hennessy (2015) for another example of the use of a multinomial logit model for competitors’ rank ordering in Alpine skiing competitions.

Bayesian linear mixed models are also on the rise. Revie et al. (2017) considered mixed models to show the viability of players’ perceptions (via surveys) to predict fitness levels for cases when a direct fitness measurement is inconvenient or not possible.

Bayesian non-parametric and semi-parametric regression models have also been employed, albeit less commonly, in sports analysis. For example, a semi-parametric latent variable approach was used by Wimmer et al. (2011) for modeling points performance in decathlon events assuming four latent abilities (sprint, jumping, throwing, and endurance). Another Bayesian nonparametric model which is based on a Dirichlet process mixture model was suggested by Pradier, Ruiz, and Perez-Cruz (2016) for modeling the effect of covariates such as the age, gender and environment in marathon runners’ performance.

Other popular regression techniques are log-linear models. See for instance, Boys and Philipson (2018), who employed an additive log-linear model for ranking cricketers, accounting for factors such as the year and player’s age. Several other log-linear models will be discussed in the football subsection 3.2.2.

3.1.2 Accounting for time

Time is a critical factor in modelling and analysis of many sports (Kovalchik and Albert 2017). The performance of athletes and teams are known to change during the season and even during the course of a game due to factors such as fatigue. Often, it is of interest to analyse factors such as fatigue or momentum that are difficult or impractical to measure. A common approach for such analysis over time is to employ a state space model (SSM) or hidden Markov model (HMM). Both of these models assume that there is an underlying latent variable, z say, that governs the value of the observed variable, y. The form of z determines the type of model employed: if z is categorical or ordinal then a hidden Markov model (HMM) is appropriate, whereas a continuous z leads to a state space model (SSM).

HMMs have been used by several authors (e.g. Albert 1993, Jensen, McShane, and Wyner 2009a, Dadashi et al. 2013, Koulis, Muthukumarana, and Briercliffe 2014). Dadashi et al. (2013) proposed a HMM to estimate timing coordination between hands and feet via estimation of temporal phases of breaststroke swimming. The model uses three axis information from wearable inertial measurement units (IMU) worn on arms and legs, and predicts three hidden states [Q=(q1,q2,q3)] corresponding to glide, propulsion and recovery in leg and arm movements.

Typically, a HMM model is defined as λ = (A, B, π), where A is the state transition probability matrix, B is the emission probability matrix that relates hidden states to observations from the wearable sensor and π is the initial state probability. The HMM model was trained using supervised learning from expert annotated video. The authors reported that the model detected correctly the phases 93.5% of the time in arm strokes and 94.4% in leg strokes. See Dadashi et al. (2013) for a detailed description.

A Poisson HMM model was used by Koulis et al. (2014) for modelling batting performance in cricket. They used a Bayesian approach with multiple states related to the batsman’s performance, where the observed variable is the number of runs per game produced. A Bayesian HMM was also used by Franks et al. (2015) to model basketball defensive placements, where the hidden states are the offensive player being guarded by each defender.

Glickman and Stern (1998) suggested a Bayesian state-space approach to predict American football teams strengths using a first-order auto-regressive process. They found that this model was able to predict the outcomes of games outcomes slightly better than the Las Vegas Betting Line oddsmaker. A nonlinear version of this model was suggested by Glickman (2001) to evaluate paired comparisons in NFL football and chess.

Some other approaches accounting for time were suggested by Stephenson and Tawn (2013) and Kovalchik and Albert (2017). Stephenson and Tawn (2013) applied concepts of extreme value theory to model annual best racing times in athletics considering an exponentially decreasing trend. This facilitates the comparison of athletes who performed in different decades. In tennis, Kovalchik and Albert (2017) fitted temporal data (time-to-serve) using a Bayesian hierarchical model and the covariates point importance and the length of the previous rally.

3.1.3 Accounting for space and time

As discussed above, modern tracking technology is providing the location of players and the ball at regular small intervals of time. This is opening the door to spatio-temporal analysis, in which the court (basketball), the course (golf) or the field (baseball) is often discretized using a grid. This grid is often a square in the Cartesian systems, e.g. one-square-foot quadrats (Miller et al. 2014) or a slice in the polar coordinates system, e.g. Reich et al. (2006); Yousefi and Swartz (2013).

In the case of basketball, a common task is to compute the probability of scoring as a function of the location. This generally yields a heat map of scoring probabilities. For instance, Reich et al. (2006) used logit multinomial Bayesian regression to assess the relationship between the shot location in the court and some covariates such as the presence of key players from the same team in the court, defensive strength, playing home or away, etc. They also assessed the significance of these predictors on shooting frequency and efficiency in different regions measured using polar coordinates (the distance to the basket and the angle). They used conditionally autoregressive (CAR) and two neighbor relation CAR priors to achieve a smoother surface borrowing information from neighbors.

Shortridge, Goldsberry, and Adams (2014), for example, extended this idea by computing the spatial variability in scoring within an empirical Bayesian framework. They used a shrinkage approach to obtain a smoother scoring probability surface. Spatial shooting patterns have been also modelled using a log-Gaussian Cox process (LGCP) (Miller et al. 2014; Franks et al. 2015). Also in basketball, Cervone et al. (2016) suggested the use of a conditional autoregressive model to compute the expected score as a function of factors such as the player in possession of the ball, defensive stance, etc. Here the CAR prior accounts for spatial autocorrelation by adding a random effect for the player. See Cervone et al. (2016) for more details.

In baseball, Jensen et al. (2009b) used a hierarchical model to estimate the probability of a defensive player catching a ball considering among other parameters the location of the player. Pitch horizontal and vertical coordinates around the strike zone were considered for estimating the probability of a strike Deshpande and Wyner (2017). Yousefi and Swartz (2013) introduced a metric for golf putting performance considering the distance to the pin and the angle. This approach splits the “green” area into eight slices centred at the pin, where the within slice probability of scoring is dependent on the distance.

3.1.4 Other methods

Variations and extensions of the above general classes of models have also been used in sports analytics. For instance, Swartz, Gill, and Muthukumarana (2009) developed a simulator for predicting one-day cricket games outcomes using a latent variable approach. Ofoghi et al. (2013) considered the selection of racing athletes in the multi-event cycling omnium using Bayesian Networks.

See also for example Stenling et al. (2015) for Bayesian structural equation model (SEM) with application to sport psychology settings. The authors estimated several latent factors associated with the athlete’s behavioral regulation measured with the Sport Motivation Scale. They found a better fit to the data using a Bayesian approach compared to the traditional method based on the maximum likelihood.

Empirical Bayes is a very popular statistical technique in which the prior distribution in Eq. (1) is obtained from observed data e.g. obtained from previous games or from players with similar characteristics or the same position. This makes EBA to be considered as pseudo or not fully Bayesian.

EBA models are generally hierarchical where the parameter of interest is assumed to come from a common pooled distribution. EBA is particularly useful in part because it leads to fast computation. For decades EBA has enjoyed consistent use in baseball for modeling batting averages Efron and Morris (1973); Brown (2008); Neal et al. (2010); Jiang et al. (2010). For instance, estimates of baseball averages of players with a few at-bats can be obtained using the league average as a prior distribution. See an extended discussion in Robinson (2017). In basketball, spatial models of shooting effectiveness are commonly built using EBA because a smooth scoring intensity surface can be obtained (e.g. Shortridge et al., 2014). Another example can be found in Baker and McHale (2017) who recently suggested an approach for estimating player strengths based on empirical Bayes.

Finally, another area that is of interest in sports analytics is experimental design. See, for example, Glickman (2008) who developed a Bayesian locally optimal design approach for knockout-based competitions.

3.2 Sports

Whereas the previous section focused on the methods and gave examples of sports in which those methods had been employed, it is also of interest to focus on the sports and review the methods that have been employed. This section presents such a discussion for the three most common sports in the literature review, namely basketball, football, baseball and includes relevant issues like streakiness and doping.

3.2.1 Basketball

Basketball is one of the most popular, dynamic and competitive sports worldwide. In this game, two teams of players interact in different locations of the court according to a given set of rules with the aim of scoring in the opponent team’s basket.

An early paper by Reich et al. (2006) used logit multinomial Bayesian regression to assess the relationship between the shot location in the court and some covariates such as the presence of key players from the same team in the court, defensive strength, playing home or away, etc. The inference in this work was limited to a one NBA player (Sam Cassell) during the season 2003–2004.

The adoption of the SportVU player tracking technology after 2010 in the NBA marks a milestone in basketball analytics. It enhanced the individual statistics levels by capturing (at 25 frames per second) the coordinates of each player (x, y) and the ball (x, y, z). Detailed statistics such as the players’ distance covered in a game and the speed developed when approaching the basket, became available as result.

The success of the paper by Goldsberry (2012) on spatial modeling of shooting effectiveness motivated a large number of publications in this area. See e.g. Shortridge et al. (2014) who suggested metrics like the expected number of points per shot for a given location within the offensive court and points above league average for each player.

Other spatial models that explicitly incorporate spatial information have also been proposed. For example, Miller et al. (2014) employed a log-Gaussian Cox process as a spatial prior and combined this with dimension reduction to obtain the players’ shooting intensities and the identification of shooting habits.

Cervone et al. (2016) introduced a statistic called expected possession value (EPV) plotted as the expected number of points in a given offensive play that a team might score versus time (from 0 to 25 seconds). This metric depends on the player in possession of the ball, its location, the defense placement, etc. The contribution of players to the team’s win probability was assessed by Deshpande and Jensen (2016) using Bayesian linear regression model. See also Lam (2018) who used a Bayesian regression to predict the outcome of NBA basketball games.

3.2.2 Football

Numerous studies have attempted to model the outcome of football matches (Rue and Salvesen 2000; Karlis and Ntzoufras 2008; Baio and Blangiardo 2010). For instance, Karlis and Ntzoufras (2008) used a Poisson difference distribution to model the difference of goals in football games using data from the English Premier League. Let X_i and Y_i be the score of the home and away teams in the i^th game. They defined a statistic Z_i as follows

(12)Zi=Xi−Yi∼PD(λ1i,λ2i)

where PD is the Poisson difference distribution with rates λ_1i and λ_2i, that are obtained using the following log-linear link functions:

(13)log(λ1i)=μ+H+AHTi+DATi

(14)log(λ2i)=μ+AATi+DHTi

where μ is a constant parameter. H is the home team coefficient. A and D are the parameters for the team attack and defense.

Baio and Blangiardo (2010) suggested some improvements on Karlis and Ntzoufras (2008) model to predict football results in the Italian Serie A championship. They obtained the number of goals from each team using a Poisson distribution rather than modeling the difference as suggested by Karlis and Ntzoufras (2008). The model produces estimates of the posterior distributions of attack and defense.

Suzuki et al. (2010) suggested a Bayesian model for forecasting the results of the 2006 World Cup taking into consideration the FIFA World Ranking. In this approach, the number of goals on each team is fit using Poisson distributions.

(15)XAB|λA∼Pois(λARARB)

(16)XBA|λB∼Pois(λBRBRA)

The values R_A and R_B are the ratings of the team A and B, respectively. The prior distributions for λ_A and λ_B were set as Gamma distributions, and expert knowledge was incorporated via elicitation. This approach does not consider other relevant variables such as the offensive/defensive skills.

Other papers used Bayesian methods for assessing the importance of player skills (Thomas, Fellingham, and Vehrs 2009), functional performance (Carvalho et al. 2017), optimum substitution times (Silva and Swartz 2016).

3.2.3 Baseball

A wide range of Bayesian techniques has been used in baseball. Predicting batsmen averages has fascinated researchers and statisticians for a long time and “several authors recently have taken a swing at the subject” (Neal et al. 2010). For instance, Efron and Morris (1973); Brown (2008); Neal et al. (2010); Jiang et al. (2010) predicted baseball averages within an empirical Bayes approach. In another examples (Albert 1993, 2008) used hidden Markov models for assessing streakiness among batsmen.

Jensen et al. (2009a) used a log-linear hierarchical model to predict player’s number of home runs per season. Age, player position and home ballpark are predictors included in the model along with previous seasons performance. A mixture model on the intercept term is used to create groups of home run hitters (elite and non-elite). The probability that a hitter is a member of the elite group is determined using a hidden Markov model. This model was shown to have better predictive accuracy than other competing methods. McShane et al. (2011) used also a hierarchical Bayesian model for the selection of performance variables that better describes offensive abilities.

A good defense is critical for winning games. However, this aspect is difficult to quantify since the traditional assessment is rather subjective, making it hard to compare the contribution of the players. Jensen et al. (2009b) addressed this issue using a Bayesian probit regression model for assessing the fielder’s effectiveness. They computed the player’s contribution to the defense in terms of runs saved, compared to the rest of the players playing the same position. The model predicts the catching probability given the location of the defender in the field, the velocity of the ball and direction that the fielder has to move to (forward or backward). It would be interesting to see an extension of this analysis considering baseball park’s constraints.

Healey (2017) suggested new statistics for players performance based on batted-ball parameters (speeds, vertical and horizontal angles) within the Bayesian philosophy. Probability density estimates are obtained using a non-parametric approach. In another example, Bendtsen (2017) suggested Bayesian networks for modeling career regimes.

3.2.4 Other team sports

In ice hockey, Thomas (2006) modeled the scoring probability as a continuous-time Markov process, and Gramacy, Jensen, and Taddy (2013) introduced a Bayesian logistic regression model for evaluating the impact of hockey players on team’s scoring. This last model is an alternative to the plus-minus approach and is based on a Laplace prior for the regression coefficients to facilitate variable selection, which is required in these high dimensional regression problems characterized by a large number of players. See also Thomas et al. (2013) who introduced a method for determining players abilities by modelling the team’s scoring rate as a semi-Markov process using hazard functions.

Giles et al. (2017) used a Bayesian regression model to assess the association between mental toughness and behavioral perseverance accounting for physical fitness in Australian rules footballers. They found association between these variables except in presence of fatigue. Multiple examples can also be found in cricket. See e.g. Damodaran (2006); Brewer (2008); Swartz et al. (2009); Boys and Philipson (2018).

3.2.5 Other related issues

Streakiness: Another cluster of publications addressed the issue of streakiness (also known as the hot hand phenomenon). Say, for example, when a baseball player shows a pattern indicating a substantially larger (than average) proportion of hits (successes) in a period of time. This apocryphal phenomenon is supposed to be experienced by athletes during the season and it has been largely studied among others by Gilovich, Vallone, and Tversky (1985); Albright (1993); Bar-Eli, Avugos, and Raab (2006).

Albert (1993), for example, inspired by the thesis of Albright (1993), used two-state hidden Markov chains while Albert (2008) employed the Bayes factor for detecting non-random changes in the batting performance. A similar approach was followed by Wetzels et al. (2016) who analyzed the streakiness rates in basketball. Yang (2004) suggested a Bayesian binary segmentation method for analyzing consecutive successes or failures. This method relies on the Bayes factor to assess the change in the success rates. The author analyzed several popular events considered to be the result of a streaky performance in basketball, baseball and golf. Whether the player missed the previous shot has been considered by Reich et al. (2006) as a predictor of the shooting frequency and location in the court in basketball. However, they found no relationship between them.

Doping: Antidoping studies on biological markers are generally addressed using a longitudinal approach within a Bayesian framework to account for within and between athlete variation. Elements of Bayesian inference are particularly useful, ranging from the established population-based reference antidoping approach to an individual passport system.

Sottas et al. (2006), for example, suggested a method for the detection of abnormal T/E (testosterone glucuronide/epitestosterone glucuronide) ratio values. This approach compares the test results against a cutoff threshold obtained using Bayesian inference and the estimated population and intraindividual mean and coefficient of variation. Robinson et al. (2007) extended this approach for the detection of another illegal drug (recombinant human erythropoietin) and Schulze et al. (2009) added genotype (UGT2B17) as a predictor into a Bayesian framework suggested by Sottas et al. (2006) to achieve an increased test sensitivity. Bayesian inference is also used by Van Renterghem et al. (2011) for the detection of testosterone based on new biomarkers.

Relative age effect: The Relative Age Effect (RAE) establishes that children/athletes who were born in the first months after the school year cutoff have more chances of success. Ishigami (2016) used a Poisson Bayesian regression model to investigate the impact of the RAE and birthplace on the chances of becoming a professional athlete in Japan. The author reported that those who were born in the first month after the cutoff were three times more likely to become a professional athlete.

3.3 Software for Bayesian computation

Bayesian computational techniques are included in most statistical software packages. We present a summary of the most popular software for conducting Bayesian analysis in sports data science, according to the papers we reviewed (Figure 2). For inclusion, the author(s) had to clearly state the name of the software package employed.

Figure 2:

Most popular software used for Bayesian analyzes in sports science.

In the papers we reviewed, 𝖱 was by far the most popular software, accounting for approximately half of the total mentions. MATLAB (2017), 𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Lunn et al. 2000) and 𝖲𝗍𝖺𝗇 (Stan Development Team 2017) completed the top four. Curiously the Python language (Python Software Foundation 2017) so far does not seem to be popular among sports scientists. The most commonly mentioned packages within the 𝖱 environment were 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄 (Martin, Quinn, and Park 2011) and 𝗋𝗃𝖺𝗀𝗌 (Plummer 2016), followed by 𝖽𝖾𝗉𝗆𝗂𝗑𝖲𝟦 (Visser and Speekenbrink 2010), 𝖱𝟤𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Sturtz, Ligges, and Gelman 2005), 𝗋𝗌𝗍𝖺𝗇 (Stan Development Team 2018), 𝖧𝗂𝖽𝖽𝖾𝗇𝖬𝖺𝗋𝗄𝗈𝗏 (Harte 2017) and 𝖻𝗋𝗆𝗌 (Bürkner 2017).

3.4 Summary of methods and applications

The variety and complexity of Bayesian statistical techniques applied to sports science problems have increased substantially over the last 15 years. To gain a better insight, we present a summary of the research articles in the Appendix (Table 3). We grouped them by sport (baseketball, football, etc) or category (doping, streaking, etc). We present first team sports followed by individual ones.

The column method refers to a classification from Table 1. We also include the statistical software/package used for the computations. In the case of R packages, some authors did not mention the version. Therefore, we cite here the most recent. The last column refers to the sources of the data specifying the competition, the season(s) and the sample size if mentioned.

Table 1:

Symbols and definitions.

Symbol	Technique
BHM	Bayesian hierarchical modeling
BR	Bayesian regression (e.g. logistic, multiple, etc)
LS	Longitudinal studies
EBA	Empirical Bayesian approach
SASTA	Spatial and/or spatio-temporal analysis
TS	Time series
BN	Bayesian networks
HMM	Hidden Markov model
MC	Markov chain
BNP	Bayesian nonparametric, Bayesian survival models, etc
Other	Other techniques, including Bayesian structural
	equation modeling

The contingency Table 2 contains the use of the statistical technique across time. The column Accounting for time (AT) contains times series, longitudinal studies and HMM. Accounting for space and time (AST) comprise the articles considering spatial and temporal association. The evolution per year is shown in Figure 3.

Table 2:

Cross-tabulation of the Bayesian technique vs. the publication period.

	AT	AST	BHM	BR	EBA	Other	Sum
(1985, 2005]	2	0	1	1	0	4	8
(2005, 2009]	4	5	8	5	2	1	25
(2009, 2013]	1	3	5	5	2	4	20
(2013, 2018]	4	8	12	13	3	9	49
Sum	11	16	26	24	7	18	102

AT, Accounting for time; AST, accounting for space and time; BHM, Bayesian hierarchical models; BR, Bayesian regression; EBA, empirical Bayesian approach

Figure 3:

Evolution of the Bayesian technique per year.

Bayesian hierarchical models (BHM) and Bayesian regression (BR) are the most popular techniques, followed by methods accounting for space and time. The frequencies during the period 2013–2018 are approximately similar to those from 1985 to 2013. This shows a tremendous rise in the use of Bayesian methods. Note however that we do not know the growth rate of scientific articles in sports statistics (frequentists + Bayesians).

Figure 4 shows the number of publications in each sports. Note that approximately 50% of the publications where on three team sports (basketball, baseball and football).

Figure 4:

Number of papers on each sport including the doping category. The category others includes one mention of the following sports: American football, athletics, Australian rules football, ball-games, decathlon, free-weight, frisbee, marathon, multiple sports, Paralympic sports, rowing, rugby union, running, skiing, triathlon and wrestling.

4 Discussion

In recent years a growing number of scientific publications have been showing the benefits, the potential and the limitations of the Bayesian philosophy in sports statistics (Ivarsson et al. 2015; Gucciardi and Zyphur 2016; Gucciardi et al. 2016; Mengersen et al. 2016). These benefits include the capacity to model complex sports problem and to make predictions taking into consideration uncertainty. In conducting this review, we found a substantial number of publications in multiple areas and applications ranging from golf, rugby, basketball, cricket, etc.

This study was designed to provide an integral characterization of the state of the art of Bayesian sports statistics as a rapidly maturing discipline. To the best of our knowledge, this is the most comprehensive review undertaken on Bayesian methods in sports statistics. We can group the majority of the reviewed articles according to the problem they try to solve as follows. They focus on the:

identification of factors or covariates that contribute to scoring, winning or to a better performance,
forecasting and prediction,
between and within-season spatial and temporal effectiveness,
players interaction and dynamics,
unusual streaky outcomes (streakiness),
developing of new metrics,
home court advantage, ball possession, and bias assessment of referees judgment, tournaments design,
player’s abilities, rankings, player’s paired comparisons and contributions to their teams in attack and defensive settings,
optimization of resources such as substitution, roster, athlete selection, batting order, players placements on the field,
training regimes effectiveness, endurance, mental toughness,
visualizations and model comparison,
wearable technology, activities identification and pattern recognition,
doping.

We found a tremendous development and a large proportion of the papers dealing with data from major professional sports leagues in the United States (MLB and NBA), in part because these have been generating high-resolution data for many years.

Similarly, far more research was identified on team sports than in individual ones, possibly because team sports are more complex and statistically richer. Although the articles considered represent almost every single continent, they were mostly concentrated in the United States, Australia, Canada, the United Kingdom, and Sweden. A question in our minds before conducting this research was whether these contributions were by sports scientist or by statisticians. We found that most of them have been contributed by statisticians and data scientists. As pointed out by Bernards et al. (2017) “most current sports scientists are not trained in Bayesian methods” (yet). A large number of these publications fall within Swartz (2018) second criteria for a good sports paper: “they address a real sporting problem” and therefore they are considered applied research.

We identified some well-established niches where specific Bayesian models are intensively used. These included Bayesian longitudinal models in anti-doping studies and log-linear models for modelling football game outcomes and EBA for baseball averages estimation. We found great interest in the search for the greatest athletes in multiple sports, e.g. athletics (Stephenson and Tawn 2013), golf (Baker and McHale 2015), tennis (Baker and McHale 2017), chess (Glickman 1999).

Bayesian methods are not a panacea for every data analysis problem. For instance, dealing with poor data or poor models will have limited success in the Bayesian context, despite some compensation can be made taking a Bayesian approach. Many of the methods based on MCMC can be computationally intensive. However, recent approaches like variational Bayes provide a substantial computational speed up (Ruiz and Perez-Cruz 2015; Blei, Kucukelbir, and McAuliffe 2017). Another limitation is the scalability of the model to big data problems, although the latest statistical advances are making possible to take advantages of modern parallel computing (Angelino, Johnson, and Adams 2016; Minsker et al. 2017).

Some challenges for future research are (1) dealing with increasingly complex datasets while exposing the methods/applications for an audience without a deep statistical background, (2) the creation of ready-to-use tools e.g. shiny apps, allowing practitioners and sports enthusiasts easy implementations and analysis, and (3) possibly embracing principles of open science e.g. open source codes, data and methodology. Good examples of the third point are Cervone et al. (2016) and Mengersen et al. (2016).

As suggested by Figure 4 a large number of sports are quite unexplored to date and the advent of high-resolution data in the next future will attract without doubt multiple research and collaborations. These high dimensional and big datasets will both motivate and benefit from the development of more efficient Bayesian MCMC methods in this area.

5 Conclusion

The Bayesian revolution has arrived in sports analytics. Since 2005 there has been a substantial increase in the Bayesian modeling in sports. We found that the number of papers between 2013 and 2018 was similar to those published in the previous three decades (1985–2013). Based on the review, Bayesian regression and Bayesian hierarchical models emerged as the most popular techniques, but other methods such as HMM and Bayesian spatial analysis are on the rise. More and more sports scientists are incorporating prior beliefs in the model and using posterior distributions to make inferences about parameters within a Bayesian paradigm. Recent new data sources have motivated the exploration of new methodologies and insights. Similarly, recent research advances have enhanced the way we summarize and make inference on sports by introducing new metrics and methods. These advances will continue to be complemented by the growing confidence of sports scientists to look beyond the traditional analytics boundaries and explore methods used in other fields.

Acknowledgments

This research was supported by the Australian Research Council (ARC) Laureate Fellowship Program and the Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS) and by the project “Bayesian Learning for Decision Making in the Big Data Era” (ID: FL150100150). First Investigator: D. Prof. Kerrie Mengersen. Thanks to Jacinta Holloway who helped in the selection of the papers. We also thank Dr. Richard Boys for his insightful comments during the early stages of the article. The authors declare no competing interests.

Appendix

Summary of methods and applications

Table 3:

Summary of publications included in the review.

Author(s)	Method	Description	Software/package	Sport	Data
Reich et al. (2006)	BR, BHM, SASTA	Logit multinomial regression to assess the relationship between the shot location and frequency (response variables) and predictors such as defensive blocks, home advantage, etc. Conditionally autoregressive (CAR) and two neighbor relation CAR (2NRCAR) priors are used to achieve a smoother surface	𝖱	Basketball	shot chart data of the NBA Season 2003–2004 of the player Sam Cassell (Minnesota Timberwolves)
Shortridge et al. (2014)	EBA, SASTA	It computes the spatial effectiveness of shooting using empirical Bayesian smoothing rate estimates. It provides estimates of the expected number of points per shots per area in the court	𝖱, 𝖢𝗅𝖺𝗌𝗌𝖨𝗇𝗍 (Bivand 2017), 𝗌𝗉 (Bivand, Pebesma, and Gomez-Rubio 2013), 𝖶𝖾𝗂𝗀𝗁𝗍𝗌 (Pasek et al. 2016)	Basketball	ESPN data from NBA (2011–2012) from made and missed field shots. They used the locations in Cartesian coordinates of the field goals
Miller et al. (2014)	BHM, SASTA	Shooting intensity modeling and players shooting habits identification using a log Gaussian Cox process (LGCP). They suggested a dimensionality reduction approach via non-negative matrix factorization	𝖱, 𝖭𝖬𝖥 (Gaujoux and Seoighe 2018)	Basketball	Made and failed shoots obtained from the optical player tracking data. NBA season 2012–2013 regular season
Franks et al. (2015)	SASTA, BR, HMM	Defensive effectiveness assessment using spatial and spatio-temporal analysis and HMM	𝖲𝗍𝖺𝗇	Basketball	Optical player tracking data from the NBA season 2013–2014. The locations of the players are obtained from cameras and recorded at 25 frames per second
Lamas et al. (2015)	BHM	Bayesian inference to compute the outcome probabilities based on offensive-defensive actions	𝖱, 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄 (Martin et al. 2011)	Basketball	Data obtained using video from six play-off games of Liga ACB in Spain (2010–2011). n = 1548 space creation dynamics (SCD) and space protection dynamics (SPD)
Deshpande and Jensen (2016)	BR	Bayesian linear regression model for assessing the contribution of players to the team’s win probability	𝖱, 𝗌𝗈𝗇𝗈𝗌𝗏𝗇 (Gramacy 2017a)	Basketball	Play-by-play ESPN data from the NBA (2006–2014)
Cervone et al. (2016)	SASTA, MC	Computation of the expected possession value (EPV) using a Markov model	𝖱, 𝖱-𝖨𝖭𝖫𝖠	Basketball	NBA season 2013–2014 Optical player tracking data from the NBA season 2013–2014 from STATS LLC. They provide a sample game dataset (Miami Heat vs. Brooklyn Nets)
Lam (2018)	BR	Bayesian regression to predict the teams winning probabilities based on past games and the players’ performance	𝖯𝗒𝗍𝗁𝗈𝗇	Basketball	NBA seasons 2013–2015. Data from Basketball-Reference consisting of 17 metrics e.g. field goals per minute, 3-point field goals per minute, etc.
Bar-Eli and Tenenbaum (1988)	BF	Psychological crisis assessment using the Bayesian likelihood ratio	–	Basketball	Questionnaire from 28 basketball experts
Wetzels et al. (2016)	HMM	Hidden Markov model for analyzing the streakiness rate	𝖱 and 𝖧𝗂𝖽𝖽𝖾𝗇𝖬𝖺𝗋𝗄𝗈𝗏 (Harte 2017)	Basketball and psychology	(1) basketball free-throw shooting from the NBA seasons 2005–2010 and (2) a visual discrimination task (4 participants)
Efron and Morris (1973)	EBA	Baseball batting averages prediction to illustrate the use of the James-Stein estimator	–	Baseball	Proportion of hits in the first 45 at bat from fourteen baseball players in the MLB season 1970
Albert (1993)	BHM, MC	Hitting streaks probability estimation using a two states Markov model and a Bayesian hierarchical model	–	Baseball	MLB 1988–1989 season. n = 200 players, 100 from each season (1988 and 1989), being 50 from each league per year
Albert (2008)	BHM	Assessment of the hitting streakiness by means of Bayesian inference	–	Baseball	Hits and outs from 287 players from the MLB season 2005
Brown (2008)	BHM, EBA	Prediction of baseball batting averages using empirical and hierarchical Bayes approach	–	Baseball	First and second half of season 2005 in the MLB ( number of hits and outs)
Jensen et al. (2009b)	BR, SASTA	Player’s defensive performance assessment using empirical Bayes approach and probit regression	–	Baseball	High-resolution data of locations of batted balls (MLB seasons 2002–2005) from Baseball Info Solutions. n≈120,000
Jiang et al. (2010)	EBA	Baseball batting averages using empirical Bayes approach and linear models	–	Baseball	number of hits and at-bats from the MLB season 2005. Players with more than 11 at bats
Neal et al. (2010)	EBA BR	Empirical Bayes approach to predict baseball averages in the second half of the season based on the first half	–	Baseball	MLB season 2004–2005. Hits and at- bats data obtained from https://www.retrosheet.org/
Jensen et al. (2009a)	BHM HMM	Home run hitting prediction using a log-linear hierarchical model. They used a hidden Markov model for separating hitters into two categories (elite and non-elite)	–	Baseball	MLB seasons 1990–2005 from Lahman Baseball Database. n = 10,280 player-years
McShane et al. (2011)	BHM HMM	They implemented a hierarchical Bayesian variable selection model for a better assessment of players’ abilities	–	Baseball	Appelman database MLB 1974–2008 seasons. A 50 offensive stats (singles, doubles, home runs, etc) from n = 8596 player-seasons and 1575 players
Albert (2016)	BR	Random effects model for estimating batting performance	𝖱	Baseball	Lahman MLB Baseball Database from the 2011 season. Strikeouts, home runs, hit-in-plays and out-in-plays
Ishigami (2016)	BR; BHM	Poisson Bayesian regression to estimate the effect of Relative Age Effect (RAE) and place where athletes were born	𝖱, 𝗋𝗃𝖺𝗀𝗌 ; 𝖲𝗍𝖺𝗇	Baseball and football	Season 2012. 12 teams Nippon Professional Baseball Organization (NPB); and 198 players. Japan Professional Football League (J. League); 277 players
Bendtsen (2017)	BN	Bayesian networks for modeling career regimes	𝖱, 𝖽𝖾𝗉𝗆𝗂𝗑𝖲𝟦 (Visser and Speekenbrink 2010)	Baseball	A random sample of 30 players that debuted during 2005 or after obtained from www.retrosheet.org
Deshpande and Wyner (2017)	BHM, BR	Bayesian hierarchical model and Bayesian logistic regression for pitch framing	𝖱, 𝖲𝗍𝖺𝗇 (Stan Development Team 2017), 𝗋𝗌𝗍𝖺𝗇 (Stan Development Team 2018)	Baseball	Horizontal and vertical coordinates obtained from the high-resolution pitch tracking dataset MLB PITCHf/x (seasons 2011–2015)
Healey (2017)	BR	Suggested new statistics for players performance based on batted-ball parameters (speeds, vertical and horizontal angles) within the Bayesian philosophy. Probability density estimates are obtained using a nonparametric approach	𝖱	Baseball	MLB Sportvision’s HIT f/x (Season 2014) comprising measurements from more than 100,000 batted-balls
Rue and Salvesen (2000)	BR TS	Dynamic log-linear Poisson model to predict games outcomes. This time dependent approach considers the teams attack and defense strengths and a psychological effect	𝖫𝖠𝖯𝖠𝖢𝖪 𝗅𝗂𝖻𝗋𝖺𝗋𝗒 (Anderson et al. 1999)	Football	Premier League and division 1 during 1993–1995 and 1997–1998 seasons
Karlis and Ntzoufras (2008)	BHM	Bayesian modelling of the match differences using the Poisson difference distribution	𝖱, 𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Lunn et al. 2000)	Football	goals/game scored in the English Premiership by the 20 teams in the season 2006–2007
Baio and Blangiardo (2010)	BHM	Bayesian log-linear random effect model to predict football results	𝖱, 𝖶𝗂𝗇𝖡𝖴𝖦𝖲	Football	goals/game scored in the Italian Serie A championship. 1991–1992 and 2007–2008 seasons. 20 teams
Suzuki et al. (2010)	BR	Bayesian log-linear Poisson model for predicting match outcomes based on expert’s opinions and the team’s rankings	–	Football	goals scored by each of the 32 teams competing in the 2006 Soccer World Cup
Shahtahmassebi and Moyeed (2016)	BHM	Generalized Poisson difference distribution (GPDD) for modeling goal differences	𝖱	Football	Goals scored in Italian Serie A (2012–2013) obtained from ESPN. 20 teams and 380 matches
Koopman and Lit (2015)	TS	Time series analysis using bivariate Poisson distribution for modeling teams goal differences	𝖮𝗑𝗆𝖾𝗍𝗋𝗂𝖼𝗌	Football	English football Premier League (2003–2012 seasons). Goals scored in the 3420 matches
Thomas et al. (2009)	BR	Bayesian linear regression for assessing the importance of skills	𝖬𝖠𝖳𝖫𝖠𝖡 (MATLAB 2017)	Football	Video annotation data from Women National Collegiate Athletic Association Division I. n = 10 games
Carvalho et al. (2017)	LS, BHM	Bayesian multilevel model for fitting functional performance and growth curves for body mass and stature	𝖱, 𝖻𝗋𝗆𝗌 (Bürkner 2017), 𝖲𝗍𝖺𝗇 (Stan Development Team 2017)	Football	Growth in body size and functional capacities in n = 33 under-11 youth soccer players from a Spanish first division club
Silva and Swartz (2016)	BR	Bayesian logistic regression to determine optimum substitution times	𝖶𝗂𝗇𝖡𝖴𝖦𝖲	Football	English Premier League (2009–2010), the German Bundesliga (2009–2010), the Spanish La Liga (2009–2010), the Italian Serie A (2009–2010), North America’s Major League Soccer (2010) and the 2010 World Cup
Razali et al. (2017)	BN	Bayesian networks for predicting the matches’ results	𝖶𝖾𝗄𝖺 (Hall et al. 2009)	Football	English Premier League (EPL) (2010–2011, 2011–2012 and 2012–2013). The data from the 20 teams was obtained from http://www.football-data.co.uk
Swartz et al. (2009)	BHM	Developed a simulator of the game outcome based on a Bayesian latent variable model	𝖶𝗂𝗇𝖡𝖴𝖦𝖲	Cricket	472 games comprising 257,922 bowled balls of the ICC from Jan 2001 to Jul 2006
Koulis et al. (2014)	HMM	Poisson HMM to model batting performance in cricket. They used a Bayesian approach with multiple states related to the batsman’s performance, where the observed variable is the number of runs produced per game		Cricket	Historical data from the top 20 ODI batsmen ranked at July 7, 2013, obtained from www.espncricinfo.com)
Stevenson and Brewer (2017)	Other, BHM	Using a Bayesian survival approach they assessed the hypothesis that batting is a more difficult task at the beginning of the game. The constructed model allows the estimation of batting abilities during the batting stages	Nested sampling implemented in Julia (Bezanson et al. 2012)	Cricket	Test Match data (batsmen from New Zealand during 1990s and 2000s) from Statsguru and Cricinfo website
Boys and Philipson (2018)	BR	Additive log-linear model for ranking cricketers, accounting for factors such as the year, player’s age, etc.	𝖱, 𝖼𝗈𝖽𝖺 (Plummer et al. 2006)	Cricket	n = 2855 test match cricketers from 1877-August 2017
Thomas (2006)	MC	Modeled the scoring probability as a continuous time Markov process		Ice hockey	Manual annotation of 18 games from the Harvard Menâ€™s Varsity Hockey team (2004–2005 season)
Gramacy et al. (2013)	BR, BHM	An approach for evaluating the impact of players performance on scoring using a logistic regression model	𝖱; 𝗋𝖾𝗀𝗅𝗈𝗀𝗂𝗍 Gramacy (2017b) 𝗍𝖾𝗑𝗍𝗂𝗋 (Taddy 2013)	Ice hockey	Players on ice of the games from 2007–2011 seasons obtained from www.nhl.com. A total of 1467 players and 18,154 goals recorded
Thomas et al. (2013)	BHM	Team’s scoring rate as a semi-Markov process using hazard functions	𝖱 and 𝖢++	Ice hockey (NHL)	Shifts from season 2007–2008 until 2011–2012. 30 teams
Glickman and Stern (1998)	TS	A Bayesian state-space approach for predicting games scores differences based on first-order auto-regressive process	–	American football	Outcomes of the 28 teams in the National Football League (NFL) seasons 1988–1993
Cafarelli et al. (2012)	BHM, BR	Bayesian logistic models for modeling the probability of converting a third down play	𝖶𝗂𝗇𝖡𝖴𝖦𝖲	American football	yards to go, outcome of each first down by team from National Football League (NFL) season 2007
Revie et al. (2017)	BR, BHM	Bayesian linear mixed model and support vector machine (SVM) to model players’ fitness as a function of players’ perceptions when direct fitness measurements are not frequently possible	𝖱	Rugby union	Questionnaire of 38 professional players from Jan-Apr 2012 and data from counter movement jump (CMJ) tests
Miskin et al. (2010)	BR, MC	Volleyball’s skill importance assessment using Markov chains and Bayesian logistic regression. This allow obtaining importance scores. In the Markov process the transition probability matrix were obtained using a Dirichlet prior	–	Volleyball	Serves, passes, digs, and attacks during the 2006 competitive season of a women’s division I
Mendes et al. (2018)	BHM, LS, BR	Longitudinal hierarchical approach for modeling accumulated hours of structured volleyball and other sports practice	𝖱, 𝖻𝗋𝗆𝗌, 𝖲𝗍𝖺𝗇	Volleyball	Questionnaire of n = 78 elite male players from Brazilian volleyball clubs
Bar-Eli et al. (1995)	BF	Bayesian likelihood ratio for assessing the referee’s behavior in competitions		Ball-games	Questionnaire by eighty professional male athletes from Israel
Yang (2004)	BF	Bayesian binary segmentation method for analyzing streakiness (consecutive successes or failures). Bayes factor tests are used to assess the change in the success rate	𝖱𝖭𝖡𝖨𝖭 from the International Mathematics and Statistics Library	Team and individuals sports: basketball, baseball and golf	Sequence of Bernoulli trials (win/loss) from Golden State Warriors in the NBA (2000–2001). Tiger Woods’ sequence of wins or loss major PGA golf championships (1996–2001). Barry Bonds home run hitting pattern in the MLB season 2001
Murray (2017)	BHM	Team’s score-augmented win-loss Bayesian model	𝖱	Ultimate (frisbee)	2016 USA Ultimate Club Division results
Mengersen et al. (2016)	BR	Bayesian inference approach for small effects as alternative of the traditional magnitude-based inference suggested by Batterham and Hopkins (2006)	𝖱, 𝖡𝖱𝗎𝗀𝗌 (Thomas et al. 2006), 𝖱𝟤𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Sturtz et al. 2005), 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄 (Martin et al. 2011)	Triathlon	Three variables were measured (hemoglobin mass, submaximal running economy and maximum blood lactate concentration) in 24 participants in 3 groups (live high-train low, and intermittent hypoxic exposure and placebo
Wimmer et al. (2011)	BR	It uses semi-parametric Latent Variable Models to fit decathlon performance outcomes using age and month of the competition as covariates	𝖱 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄	Decathlon	3103 competitions from the world’s best performance records (1998–2009)
Pradier et al. (2016)	BNP	Bayesian Nonparametric Models (BNP) approach for modeling the performance of marathon runners	𝖬𝖠𝖳𝖫𝖠𝖡	Marathon	New York City (2006–2011, 249,899 runners), Boston and London (2010–2011, 117,255 runners) marathons
Stephenson and Tawn (2013)	Other	Bayesian inference based on extreme value methods to identify best athlete performance assuming a exponential decreasing trend		Athletics	Male/female annual best times in Olympic distance track events (100 m, 200 m, etc) from 1908–2010
Dadashi et al. (2013)	HMM	Swimming temporal phases modeled using HMM	–	Swimming (breaststroke)	7 well-trained swimmers (4 males and 3 females) equipped with wearable inertial measurement units
Dadashi, Millet, and Aminian (2015)	BR	Bayesian approach for estimating cycles swimming velocity using data from wearable technology	–	Swimming (breaststroke)	Eight professional and seven recreational swimmers wearing IMU
Kovalchik and Albert (2017)	BHM	Bayesian hierarchical model of serve routine (time-to-serve) considering the covariates point importance and the length of the previous rally. Point	𝖱, 𝗋𝗃𝖺𝗀𝗌 (Plummer 2016)	Tennis	175 matches from the 2016 Australian Open using Hawk-Eye multi-camera
Baker and McHale (2017)	EBA	Empirical Bayes model extension for estimating players’ strengths	–	Tennis	Grand Slams (1968–2016), 21,921 matches and 1123 players
Glickman and Hennessy (2015)	BHM	Multinomial logit model for rank ordering of competitors based on the extreme value distribution	𝖱, 𝗋𝗃𝖺𝗀𝗌	Skiing	Women’s Alpine downhill competitions (2002–2013)
Usami (2017)	LS	Bayesian longitudinal method for paired comparisons based on the Bradley-Terry model	–	Sumo wrestling	10 wrestlers from the Japan Sumo Association (2005–2009)
Ofoghi et al. (2013)	BN	Racing athletes selection using machine learning techniques and Bayesian networks in the multi-event cycling omnium	𝖶𝖾𝗄𝖺	Cycling (omnium)	Australian Championships 2009, World Championships 2007–2010, the UCI World Cups (2010–2011), and the Oceania Championships 2010
Yousefi and Swartz (2013)	SASTA, BHM	Bayesian spatial model for estimating the expected number of putts within the green area according to the distance to the hole and the angle. This approach, based on a truncated-Poisson distribution, allows assessing putting performance accounting for the difficulty of putts	–	Golf	ShotLink data from the PGA Tour 2012
Vetter, Yu, and Foose (2017)	BR, BHM	Bayesian regression to assess the impact of several predictors (age, ability, training and intensity) on the training outcomes in four exercise types (muscular strength, speed, power and cardiorespiratory)	𝖱	Various	Combined data from 34 studies between 1984 and 2015
Percy (2013)	BHM	Bayesian shrinkage method for class handicapping (that allows athletes to compete on equal terms). This method would allow reducing the actual large number of handicapping classes (often with just a few competitors) by grouping the competitors in smaller number of classes	𝖤𝗑𝖼𝖾𝗅	Paralympic sports	Women’s 100m running finals and men’s 100m freestyle swimming. Paralympic Games Beijing 2008
Glickman (2008)	BR	Bayesian model of paired comparisons in knockout-based competitions using the Thurstone-Mosteller approach. This method matches the competitors with the aim of maximizing the probability that the best player advances in the competition as much as possible. Given the strength of each competitor, the model computes the probability of one defeating the other, placing a multivariate normal prior on the strength	𝖢	–	Simulated data
Sottas et al. (2006)	LS	Bayesian longitudinal analysis of blood samples to detect abnormal values of a biomarker	𝖬𝖠𝖳𝖫𝖠𝖡 (MATLAB 2017)	Doping	Two longitudinal studies: (1) double-blind study, 17 athletes and 332 observations. (2) 188 samples from 11 male athletes
Robinson et al. (2007)	LS	Bayesian inference of longitudinal blood sample for doping detection	–	Doping	135 blood profiles from 1039 samples from the three studies (elite athletes, amateur athletes and volunteers)
Schulze et al. (2009)	LS	Longitudinal Bayesian model considering genotype information for derivation of doping cut-off points	–	Doping	Urinary samples in 55 male volunteers having one, two or no allele of the UGT2B17 gene
Sottas, Saugy, and Saudan (2010)	LS, BHM	Longitudinal study of bio-markers based on Bayesian inference	–	Doping	432 urine samples from 28 participants
Van Renterghem et al. (2011)	LS	Adaptive model based on Bayesian inference for finding new bio-markers to be used in doping detection	𝖬𝖠𝖳𝖫𝖠𝖡	Doping	42 urine samples from six healthy male volunteers
Stenling et al. (2015)	Other	Discusses the use of Bayesian structural equation modeling in sport psychology settings illustrated using data from a Sport Motivation Scale II. They reported a better fit to the date using this Bayesian approach compared to the traditional maximum likelihood	𝖬𝗉𝗅𝗎𝗌 (Muthén and Muthén 1998-2012)	Multiple team and individual sports	380 subjects from high school and sport teams in Sweden
Tamminen et al. (2016)	Other	Emotion regulation assessment using a multilevel Bayesian structural equation modeling approach. Emotion regulation at personal and team levels were found to be associated with athletes’ enjoyment and commitment	𝖬𝗉𝗅𝗎𝗌	Multiple team sports	n = 451 adolescent athletes from 45 teams in Ontario and British Columbia, Canada
Gucciardi et al. (2016)	Other	Self-reported mental toughness assessment accounting for cultural differences using Bayesian structural equation modeling and approximate measurement invariance	𝖬𝗉𝗅𝗎𝗌	Multiple individual and team sports	Male and female athletes from Australia (n = 353), China (n = 254) and Malaysia (n = 341)
Josefsson et al. (2017)	Other	Bayesian cross-sectional and longitudinal design for modeling mindfulness on rumination and emotion regulation	𝖬𝗉𝗅𝗎𝗌	Multiple sports	172 male and 69 female elite athletes from Sweden
Giles et al. (2017)	BR	Bayesian regression to assess the association between mental toughness and behavioral perseverance accounting for physical fitness. Although they found association between these variables, mental toughness was not a good predictor of behavioral perseverance in presence of fatigue	𝖬𝗉𝗅𝗎𝗌	Australian rules football	38 male footballers from the West Australian Football League and Western Australian Amateur Football League

References

Albert, J. 1993. “A Statistical Analysis of Hitting Streaks in Baseball: Comment.” Journal of the American Statistical Association 88:1184–1188.10.2307/2291255Search in Google Scholar

Albert, J. 2008. “Streaky Hitting in Baseball.” Journal of Quantitative Analysis in Sports 4:1184–1188.10.2202/1559-0410.1085Search in Google Scholar

Albert, J. 2016. “Improved Component Predictions of Batting and Pitching Measures.” Journal of Quantitative Analysis in Sports 12:73–85.10.1515/jqas-2015-0063Search in Google Scholar

Albright, S. C. 1993. “A Statistical Analysis of Hitting Streaks in Baseball.” Journal of the American Statistical Association 88:1175–1183.10.1080/01621459.1993.10476395Search in Google Scholar

Anderson, E., Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, S. Hammarling, A. Greenbaum, A. McKenney, et al. 1999. LAPACK Users’ Guide (Third Ed.). Philadelphia, PA, USA: Society for Industrial and Applied Mathematics.10.1137/1.9780898719604Search in Google Scholar

Angelino, E., M. J. Johnson, and R. P. Adams. 2016. “Patterns of Scalable Bayesian Inference.” Foundations and Trends^® in Machine Learning 9:119–247.10.1561/9781680832198Search in Google Scholar

Baio, G. and M. Blangiardo. 2010. “Bayesian Hierarchical Model for the Prediction of Football Results.” Journal of Applied Statistics 37:253–264.10.1080/02664760802684177Search in Google Scholar

Baker, R. D. and I. G. McHale. 2015. “Deterministic Evolution of Strength in Multiple Comparisons Models: Who is the Greatest Golfer?” Scandinavian Journal of Statistics 42:180–196. http://doi.wiley.com/10.1111/sjos.12101.10.1111/sjos.12101Search in Google Scholar

Baker, R. D. and I. G. McHale. 2017. “An Empirical Bayes Model for Time-Varying Paired Comparisons Ratings: Who is the Greatest Women’s Tennis Player?” European Journal of Operational Research 258:328–333. http://linkinghub.elsevier.com/retrieve/pii/S0377221716306828.10.1016/j.ejor.2016.08.043Search in Google Scholar

Bar-Eli, M. and G. Tenenbaum. 1988. “Time Phases and the Individual Psychological Crisis in Sports Competition: Theory and Research Findings.” Journal of Sports Sciences 6:141–149. http://www.tandfonline.com/doi/abs/10.1080/02640418808729804.10.1080/02640418808729804Search in Google Scholar PubMed

Bar-Eli, M., N. Levy-Kolker, J. S. Pie, and G. Tenenbaum. 1995. “A Crisis-Related Analysis of Perceived Referees’ Behavior in Competition.” Journal of Applied Sport Psychology 7:63–80.10.1080/10413209508406301Search in Google Scholar

Bar-Eli, M., S. Avugos, and M. Raab. 2006. “Twenty Years of ‘Hot Hand’ Research: Review and Critique.” Psychology of Sport and Exercise 7:525–553.10.1016/j.psychsport.2006.03.001Search in Google Scholar

Batterham, A. M. and W. G. Hopkins. 2006. “Making Meaningful Inferences about Magnitudes.” International Journal of Sports Physiology and Performance 1:50–57.10.1123/ijspp.1.1.50Search in Google Scholar

Bendtsen, M. 2017. “Regimes in Baseball Players’ Career Data.” Data Mining and Knowledge Discovery 31:1580–1621. http://link.springer.com/10.1007/s10618-017-0510-5.10.1007/s10618-017-0510-5Search in Google Scholar

Berger, J. O. 2013. Statistical Decision Theory and Bayesian Analysis. New York: Springer Science & Business Media.Search in Google Scholar

Bernardo, J. M. and A. F. Smith. 2009. Bayesian Theory. Volume 405, England: John Wiley & Sons.Search in Google Scholar

Bernards, J. R., K. Sato, G. G. Haff, and C. D. Bazyler. 2017. “Current Research and Statistical Practices in Sport Science and a Need for Change.” Sports (Basel) 5(4):87.10.3390/sports5040087Search in Google Scholar PubMed PubMed Central

Besag, J., P. Green, D. Higdon, and K. Mengersen. 1995. “Bayesian Computation and Stochastic Systems.” Statistical Science 10:3–41.10.1214/ss/1177010123Search in Google Scholar

Bezanson, J., S. Karpinski, V. B. Shah, and A. Edelman. 2012. “Julia: A Fast Dynamic Language for Technical Computing.” arXiv preprint arXiv:1209.5145.Search in Google Scholar

Bivand, R. 2017. classInt: Choose Univariate Class Intervals. https://CRAN.R-project.org/package=classInt, R package version 0.1-24.Search in Google Scholar

Bivand, R. S., E. Pebesma, and V. Gomez-Rubio. 2013. Applied Spatial Data Analysis with R. Second edition. New York, NY: Springer. http://www.asdar-book.org/.10.1007/978-1-4614-7618-4Search in Google Scholar

Blei, D. M., A. Kucukelbir, and J. D. McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association 112(518):859–877.10.1080/01621459.2017.1285773Search in Google Scholar

Boys, R. J. and P. M. Philipson. 2018. “On the Ranking of Test Match Batsmen.” arXiv preprint arXiv:1806.05496.10.1111/rssc.12298Search in Google Scholar

Brewer, B. J. 2008. “Getting Your Eye in: A Bayesian Analysis of Early Dismissals in Cricket.” arXiv preprint arXiv:0801.4408.Search in Google Scholar

Brown, L. D. 2008. “In-Season Prediction of Batting Averages: A Field Test of Empirical Bayes and Bayes Methodologies.” The Annals of Applied Statistics 2:113–152.10.1214/07-AOAS138Search in Google Scholar

Bürkner, P.-C. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80:1–28.10.18637/jss.v080.i01Search in Google Scholar

Cafarelli, R., C. J. Rigdon, and S. E. Rigdon. 2012. “Models for Third Down Conversion in the National Football League.” Journal of Quantitative Analysis in Sports 8.10.1515/1559-0410.1383Search in Google Scholar

Carvalho, H. M., J. A. Lekue, S. M. Gil, and I. Bidaurrazaga-Letona. 2017. “Pubertal Development of Body Size and Soccer-Specific Functional Capacities in Adolescent Players.” Research in Sports Medicine 25:421–436. https://www.tandfonline.com/doi/full/10.1080/15438627.2017.1365301.10.1080/15438627.2017.1365301Search in Google Scholar PubMed

Cervone, D., A. D’Amour, L. Bornn, and K. Goldsberry. 2016. “A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes.” Journal of the American Statistical Association 111:585–599.10.1080/01621459.2016.1141685Search in Google Scholar

Dadashi, F., A. Arami, F. Crettenand, G. P. Millet, J. Komar, L. Seifert, and K. Aminian. 2013. “A Hidden Markov Model of the Breaststroke Swimming Temporal Phases Using Wearable Inertial Measurement Units.” in Body Sensor Networks (BSN), 2013 IEEE International Conference on, IEEE, 1–6.Search in Google Scholar

Dadashi, F., G. P. Millet, and K. Aminian. 2015. “A Bayesian Approach for Pervasive Estimation of Breaststroke Velocity Using a Wearable IMU.” Pervasive and Mobile Computing 19:37–46.10.1016/j.pmcj.2014.03.001Search in Google Scholar

Damodaran, U. 2006. “Stochastic Dominance and Analysis of ODI Batting Performance: The Indian Cricket Team, 1989–2005.” Journal of Sports Science & Medicine 5:503.Search in Google Scholar

Deshpande, S. K. and S. T. Jensen. 2016. “Estimating an NBA Player’s Impact on his Team’s Chances of Winning.” Journal of Quantitative Analysis in Sports 12:51–72. https://www.degruyter.com/view/j/jqas.2016.12.issue-2/jqas-2015-0027/jqas-2015-0027.xml.10.1515/jqas-2015-0027Search in Google Scholar

Deshpande, S. K. and A. Wyner. 2017. “A Hierarchical Bayesian Model of Pitch Framing.” Journal of Quantitative Analysis in Sports 13:95–112.10.1515/jqas-2017-0027Search in Google Scholar

Efron, B. and C. Morris. 1973. “Combining Possibly Related Estimation Problems.” Journal of the Royal Statistical Society. Series B (Methodological) 35:379–421.10.1111/j.2517-6161.1973.tb00968.xSearch in Google Scholar

Franks, A., A. Miller, L. Bornn, K. Goldsberry. 2015. “Characterizing the Spatial Structure of Defensive Skill in Professional Basketball.” The Annals of Applied Statistics 9(1):94–121.10.1214/14-AOAS799Search in Google Scholar

Gaujoux, R. and C. Seoighe. 2018. The Package NMF: Manual Pages. https://cran.r-project.org/package=NMF, r package version 0.21.0.Search in Google Scholar

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. Volume 2, Boca Raton, FL: CRC Press.10.1201/b16018Search in Google Scholar

Giles, B., P. S. Goods, D. R. Warner, D. Quain, P. Peeling, K. J. Ducker, B. Dawson, and D. F. Gucciardi. 2017. “Mental Toughness and Behavioural Perseverance: A Conceptual Replication and Extension.” Journal of Science and Medicine in Sport 21:640–645.10.1016/j.jsams.2017.10.036Search in Google Scholar PubMed

Gilovich, T., R. Vallone, and A. Tversky. 1985. “The Hot Hand in Basketball: On the Misperception of Random Sequences.” Cognitive Psychology 17:295–314.10.1017/CBO9780511808098.035Search in Google Scholar

Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48:377–394.10.1111/1467-9876.00159Search in Google Scholar

Glickman, M. E. 2001. “Dynamic Paired Comparison Models with Stochastic Variances.” Journal of Applied Statistics 28:673–689.10.1080/02664760120059219Search in Google Scholar

Glickman, M. E. 2008. “Bayesian Locally Optimal Design of Knockout Tournaments.” Journal of Statistical Planning and Inference 138:2117–2127.10.1016/j.jspi.2007.09.007Search in Google Scholar

Glickman, M. E. and H. S. Stern. 1998. “A State-Space Model for National Football League Scores.” Journal of the American Statistical Association 93:25–35.10.1137/1.9780898718386.ch5Search in Google Scholar

Glickman, M. E. and J. Hennessy. 2015. “A Stochastic Rank Ordered Logit Model for Rating Multi-Competitor Games and Sports.” Journal of Quantitative Analysis in Sports 11. https://www.degruyter.com/view/j/jqas.2015.11.issue-3/jqas-2015-0012/jqas-2015-0012.xml.10.1515/jqas-2015-0012Search in Google Scholar

Goldsberry, K. 2012. “Courtvision: New Visual and Spatial Analytics for the NBA.” in 2012 MIT Sloan Sports Analytics Conference.Search in Google Scholar

Gramacy, R. B. 2017a. monomvn: Estimation for Multivariate Normal and Student-t Data with Monotone Missingness. https://CRAN.R-project.org/package=monomvn, R package version 1.9-7.Search in Google Scholar

Gramacy, R. B. 2017b. reglogit: Simulation-Based Regularized Logistic Regression. https://CRAN.R-project.org/package=reglogit, r package version 1.2-5.Search in Google Scholar

Gramacy, R. B., S. T. Jensen, and M. Taddy. 2013. “Estimating Player Contribution in Hockey with Regularized Logistic Regression.” Journal of Quantitative Analysis in Sports 9:97–111.10.1515/jqas-2012-0001Search in Google Scholar

Gucciardi, D. and M. Zyphur. 2016. “Exploratory Structural Equation Modelling and Bayesian Estimation.” in An Introduction to Intermediate and Advanced Analyses for Sport and Exercise Scientists. United Kingdom: John Wiley & Sons, pp. 172–194.Search in Google Scholar

Gucciardi, D. F., C.-Q. Zhang, V. Ponnusamy, G. Si, and A. Stenling. 2016. “Cross-Cultural Invariance of the Mental Toughness Inventory Among Australian, Chinese, and Malaysian Athletes: A Bayesian Estimation Approach.” Journal of Sport and Exercise Psychology 38:187–202. http://journals.humankinetics.com/doi/10.1123/jsep.2015-0320.10.1123/jsep.2015-0320Search in Google Scholar PubMed

Gudmundsson, J. and M. Horton. 2017. “Spatio-Temporal Analysis of Team Sports.” ACM Computing Surveys (CSUR) 50:22.10.1145/3054132Search in Google Scholar

Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. “The WEKA Data Mining Software: An Update.” SIGKDD Explorations 11:10–18.10.1145/1656274.1656278Search in Google Scholar

Hand, D. J. and K. Yu. 2001. “Idiot’s Bayes–not so Stupid After All?” International Statistical Review 69:385–398.10.1111/j.1751-5823.2001.tb00465.xSearch in Google Scholar

Harte, D. 2017. HiddenMarkov: Hidden Markov Models. Wellington: Statistics Research Associates.http://www.statsresearch.co.nz/dsh/sslib/, R package version 1.8-11.Search in Google Scholar

Healey, G. 2017. “Learning, Visualizing, and Assessing a Model for the Intrinsic Value of a Batted Ball.” IEEE Access 5:13811–13822.10.1109/ACCESS.2017.2728663Search in Google Scholar

Ishigami, H. 2016. “Relative age and Birthplace Effect in Japanese Professional Sports: A Quantitative Evaluation Using a Bayesian Hierarchical Poisson model.” Journal of sports sciences 34:143–154.10.1080/02640414.2015.1039462Search in Google Scholar PubMed

Ivarsson, A., M. B. Andersen, A. Stenling, U. Johnson, and M. Lindwall. 2015. “Things we Still haven’t Learned (So Far).” Journal of Sport and Exercise Psychology 37:449–461.10.1123/jsep.2015-0015Search in Google Scholar PubMed

Jensen, S. T., B. B. McShane, and A. J. Wyner. 2009a. “Hierarchical Bayesian Modeling of Hitting Performance in Baseball.” Bayesian Analysis 4(4):631–652.10.1214/09-BA424Search in Google Scholar

Jensen, S. T., K. E. Shirley, and A. J. Wyner. 2009b. “Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball.” The Annals of Applied Statistics 3(2):491–520.10.1214/08-AOAS228Search in Google Scholar

Jiang, W., C.-H. Zhang, et al. 2010. “Empirical Bayes In-Season Prediction of Baseball Batting Averages.” in Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown. Beachwood, Ohio, USA: Institute of Mathematical Statistics, pp. 263–273. https://projecteuclid.org/euclid.imsc/1288099025.10.1214/10-IMSCOLL618Search in Google Scholar

Josefsson, T., A. Ivarsson, M. Lindwall, H. Gustafsson, A. Stenling, J. Böröy, E. Mattsson, J. Carnebratt, S. Sevholt, and E. Falkevik. 2017. “Mindfulness Mechanisms in Sports: Mediating Effects of Rumination and Emotion Regulation on Sport-Specific Coping.” Mindfulness 8:1354–1363. http://link.springer.com/10.1007/s12671-017-0711-4.10.1007/s12671-017-0711-4Search in Google Scholar PubMed PubMed Central

Karlis, D. and I. Ntzoufras. 2008. “Bayesian Modelling of Football Outcomes: Using the Skellam’s Distribution for the Goal Difference.” IMA Journal of Management Mathematics 20:133–145.10.1093/imaman/dpn026Search in Google Scholar

Koopman, S. J. and R. Lit. 2015. “A Dynamic Bivariate Poisson Model for Analysing and Forecasting Match Results in the English Premier League.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 178:167–186. http://doi.wiley.com/10.1111/rssa.12042.10.1111/rssa.12042Search in Google Scholar

Koulis, T., S. Muthukumarana, and C. D. Briercliffe. 2014. “A Bayesian Stochastic Model for Batting Performance Evaluation in One-Day Cricket.” Journal of Quantitative Analysis in Sports 10:1–13.10.1515/jqas-2013-0057Search in Google Scholar

Kovalchik, S. A. and J. Albert. 2017. “A Multilevel Bayesian Approach for Modeling the Time-to-Serve in Professional Tennis.” Journal of Quantitative Analysis in Sports 13:49–62. http://www.degruyter.com/view/j/jqas.2017.13.issue-2/jqas-2016-0091/jqas-2016-0091.xml.10.1515/jqas-2016-0091Search in Google Scholar

Lam, M. W. 2018. “One-Match-Ahead Forecasting in Two-Team Sports with Stacked Bayesian Regressions.” Journal of Artificial Intelligence and Soft Computing Research 8:159–171.10.1515/jaiscr-2018-0011Search in Google Scholar

Lamas, L., F. Santana, M. Heiner, C. Ugrinowitsch, and G. Fellingham. 2015. “Modeling the Offensive-Defensive Interaction and Resulting Outcomes in Basketball.” PLoS One 10:e0144435. http://dx.plos.org/10.1371/journal.pone.0144435.10.1371/journal.pone.0144435Search in Google Scholar PubMed PubMed Central

Liberati, A., D. G. Altman, J. Tetzlaff, C. Mulrow, P. C. Gøtzsche, J. P. Ioannidis, M. Clarke, P. J. Devereaux, J. Kleijnen, and D. Moher. 2009. “The Prisma Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Health Care Interventions: Explanation and Elaboration.” PLoS Medicine 6:e1000100.10.1371/journal.pmed.1000100Search in Google Scholar PubMed PubMed Central

Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter. 2000. “WinBUGS – A Bayesian Modelling Framework: Concepts, Structure, and Extensibility.” Statistics and Computing 10:325–337.10.1023/A:1008929526011Search in Google Scholar

Martin, A. D., K. M. Quinn, and J. H. Park. 2011. “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software 42:22. http://www.jstatsoft.org/v42/i09/.10.18637/jss.v042.i09Search in Google Scholar

MATLAB. 2017. “MATLAB and Statistics Toolbox Release.” The MathWorks, Natick, MA, USA.Search in Google Scholar

McFadden, D. 1973. Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers of Econometrics, New York: Academic Press.Search in Google Scholar

McShane, B. B., A. Braunstein, J. Piette, and S. T. Jensen. 2011. “A Hierarchical Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics.” Journal of Quantitative Analysis in Sports 7:1–26.10.2202/1559-0410.1323Search in Google Scholar

Mendes, F. G., J. V. Nascimento, E. R. Souza, C. Collet, M. Milistetd, J. Côté, and H. M. Carvalho. 2018. “Retrospective Analysis of Accumulated Structured Practice: A Bayesian Multilevel Analysis of Elite Brazilian Volleyball Players.” High Ability Studies 29(2):1–15.10.1080/13598139.2018.1507901Search in Google Scholar

Mengersen, K. L., C. C. Drovandi, C. P. Robert, D. B. Pyne, and C. J. Gore. 2016. “Bayesian Estimation of Small Effects in Exercise and Sports Science.” PLoS One 11:e0147311. http://dx.plos.org/10.1371/journal.pone.0147311.10.1371/journal.pone.0147311Search in Google Scholar PubMed PubMed Central

Miller, A., L. Bornn, R. Adams, and K. Goldsberry. 2014. “Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball.” in International Conference on Machine Learning, pp. 235–243.Search in Google Scholar

Minsker, S., S. Srivastava, L. Lin, and D. B. Dunson. 2017. “Robust and Scalable Bayes via a Median of Subset Posterior Measures.” The Journal of Machine Learning Research 18:4488–4527.Search in Google Scholar

Miskin, M. A., G. W. Fellingham, and L. W. Florence. 2010. “Skill Importance in Women’s Volleyball.” Journal of Quantitative Analysis in Sports 6.10.2202/1559-0410.1234Search in Google Scholar

Murray, T. A. 2017. “Ranking Ultimate Teams Using a Bayesian Score-Augmented Win-Loss Model.” Journal of Quantitative Analysis in Sports 13:63–78. http://www.degruyter.com/view/j/jqas.2017.13.issue-2/jqas-2016-0097/jqas-2016-0097.xml.10.1515/jqas-2016-0097Search in Google Scholar

Muthén, L. and B. Muthén. 1998-2012. Mplus User’s Guide (7th ed.). Los Angeles, CA: Muthén & Muthén.Search in Google Scholar

Neal, D., J. Tan, F. Hao, and S. S. Wu. 2010. “Simply Better: Using Regression Models to Estimate Major League Batting Averages.” Journal of Quantitative Analysis in Sports 6:1–14.10.2202/1559-0410.1229Search in Google Scholar

Ofoghi, B., J. Zeleznikow, C. MacMahon, and D. Dwyer. 2013. “Supporting Athlete Selection and Strategic Planning in Track Cycling Omnium: A Statistical and Machine Learning Approach.” Information Sciences 233:200–213.10.1016/j.ins.2012.12.050Search in Google Scholar

Pasek, J., with some assistance from Alex Tahk, some code modified from R-core, Additional contributions by Gene Culter, and M. Schwemmle. 2016. Weights: Weighting and Weighted Statistics. https://CRAN.R-project.org/package=weights, R package version 0.85.Search in Google Scholar

Percy, D. F. 2013. “Generic Handicapping for Paralympic Sports.” IMA Journal of Management Mathematics 24:349–361. https://academic.oup.com/imaman/article-lookup/doi/10.1093/imaman/dps013.10.1093/imaman/dps013Search in Google Scholar

Plummer, M. 2016. rjags: Bayesian Graphical Models Using MCMC. https://CRAN.R-project.org/package=rjags, R package version 4-6.Search in Google Scholar

Plummer, M., N. Best, K. Cowles, and K. Vines. 2006. “Coda: Convergence diagnosis and Output Analysis for MCMC.” R News 6:7–11. https://journal.r-project.org/archive/.Search in Google Scholar

Pradier, M. F., F. J. Ruiz, and F. Perez-Cruz. 2016. “Prior Design for Dependent Dirichlet Processes: An Application to Marathon Modeling.” PLoS One 11:e0147402.10.1371/journal.pone.0147402Search in Google Scholar PubMed PubMed Central

Python Software Foundation. 2017. Python Language Reference. http://www.python.org.Search in Google Scholar

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.Search in Google Scholar

Razali, N., A. Mustapha, F. A. Yatim, and R. Ab Aziz. 2017. “Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL).” IOP Conference Series: Materials Science and Engineering 226:012099. http://stacks.iop.org/1757-899X/226/i=1/a=012099?key=crossref.e4dede28b99ccb519dbad2dc125920ef.10.1088/1757-899X/226/1/012099Search in Google Scholar

Reich, B. J., J. S. Hodges, B. P. Carlin, and A. M. Reich. 2006. “A Spatial Analysis of Basketball Shot Chart Data.” The American Statistician 60:3–12.10.1198/000313006X90305Search in Google Scholar

Revie, M., K. J. Wilson, R. Holdsworth, and S. Yule. 2017. “On Modeling Player Fitness in Training for Team Sports with Application to Professional Rugby.” International Journal of Sports Science & Coaching 12:183–193.10.1177/1747954117694736Search in Google Scholar

Robinson, D. 2017. Introduction to Empirical Bayes: Examples from Baseball Statistics. Gumroad. https://github.com/dgrtwo/empirical-bayes-book.Search in Google Scholar

Robinson, N., P.-E. Sottas, P. Mangin, and M. Saugy. 2007. “Bayesian Detection of Abnormal Hematological Values to Introduce a No-Start Rule for Heterogeneous Populations of Athletes.” Haematologica 92:1143–1144. http://www.haematologica.org/cgi/doi/10.3324/haematol.11182.10.3324/haematol.11182Search in Google Scholar PubMed

Rue, H. and O. Salvesen. 2000. “Prediction and Retrospective Analysis of Soccer Matches in a League.” Journal of the Royal Statistical Society: Series D (The Statistician) 49:399–418.10.1111/1467-9884.00243Search in Google Scholar

Ruiz, F. J. and F. Perez-Cruz. 2015. “A Generative Model for Predicting Outcomes in College Basketball.” Journal of Quantitative Analysis in Sports 11:39–52.10.1515/jqas-2014-0055Search in Google Scholar

Schulze, J. J., J. Lundmark, M. Garle, L. Ekström, P.-E. Sottas, and A. Rane. 2009. “Substantial Advantage of a Combined Bayesian and Genotyping Approach in Testosterone Doping Tests.” Steroids 74:365–368. http://linkinghub.elsevier.com/retrieve/pii/S0039128X08002870.10.1016/j.steroids.2008.11.003Search in Google Scholar PubMed

Shahtahmassebi, G. and R. Moyeed. 2016. “An Application of the Generalized Poisson Difference Distribution to the Bayesian Modelling of Football Scores.” Statistica Neerlandica 70:260–273. http://doi.wiley.com/10.1111/stan.12087.10.1111/stan.12087Search in Google Scholar

Shortridge, A., K. Goldsberry, and M. Adams. 2014. “Creating Space to Shoot: Quantifying Spatial Relative Field Goal Efficiency in Basketball.” Journal of Quantitative Analysis in Sports 10:303–313. https://www.degruyter.com/view/j/jqas.2014.10.issue-3/jqas-2013-0094/jqas-2013-0094.xml.10.1515/jqas-2013-0094Search in Google Scholar

Silva, R. M. and T. B. Swartz. 2016. “Analysis of Substitution Times in soccer.” Journal of Quantitative Analysis in Sports 12:113–122. https://www.degruyter.com/view/j/jqas.2016.12.issue-3/jqas-2015-0114/jqas-2015-0114.xml.10.1515/jqas-2015-0114Search in Google Scholar

Sottas, P.-E., N. Baume, C. Saudan, C. Schweizer, M. Kamber, and M. Saugy. 2006. “Bayesian Detection of Abnormal Values in Longitudinal Biomarkers with an Application to T/E Ratio.” Biostatistics 8:285–296.10.1093/biostatistics/kxl009Search in Google Scholar PubMed

Sottas, P.-E., M. Saugy, and C. Saudan. 2010. “Endogenous Steroid Profiling in the Athlete Biological Passport.” Endocrinology and Metabolism Clinics 39:59–73.10.1016/j.ecl.2009.11.003Search in Google Scholar PubMed

Stan Development Team. 2017. The Stan Core Library. http://mc-stan.org.Search in Google Scholar

Stan Development Team. 2018. RStan: the R interface to Stan. http://mc-stan.org/, R package version 2.17.3.Search in Google Scholar

Stenling, A., A. Ivarsson, U. Johnson, and M. Lindwall. 2015. “Bayesian Structural Equation Modeling in Sport and Exercise Psychology.” Journal of Sport and Exercise Psychology 37:410–420. http://journals.humankinetics.com/doi/10.1123/jsep.2014-0330.10.1123/jsep.2014-0330Search in Google Scholar PubMed

Stephenson, A. G. and J. A. Tawn. 2013. “Determining the Best Track Performances of All Time Using a Conceptual Population Model for Athletics Records.” Journal of Quantitative Analysis in Sports 9:67–76.10.1515/jqas-2012-0047Search in Google Scholar

Stevenson, O. G. and B. J. Brewer. 2017. “Bayesian Survival Analysis of Batsmen in Test Cricket.” Journal of Quantitative Analysis in Sports 13:25–36.10.1515/jqas-2016-0090Search in Google Scholar

Sturtz, S., U. Ligges, and A. Gelman. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12:1–16. http://www.jstatsoft.org.10.18637/jss.v012.i03Search in Google Scholar

Suzuki, A. K., L. E. B. Salasar, J. G. Leite, and F. Louzada-Neto. 2010. “A Bayesian Approach for Predicting Match Outcomes: The 2006 (Association) Football World Cup.” Journal of the Operational Research Society 61:1530–1539. https://doi.org/10.1057/jors.2009.127.10.1057/jors.2009.127Search in Google Scholar

Swartz, T. B. 2018. “Where Should I Publish my Sports Paper?” The American Statistician 1–6. https://doi.org/10.1080/00031305.2018.1459842.10.1080/00031305.2018.1459842Search in Google Scholar

Swartz, T. B., P. S. Gill, and S. Muthukumarana. 2009. “Modelling and Simulation for One-Day Cricket.” Canadian Journal of Statistics 37:143–160. http://doi.wiley.com/10.1002/cjs.10017.10.1002/cjs.10017Search in Google Scholar

Taddy, M. 2013. “Multinomial Inverse Regression for Text Analysis.” Journal of the American Statistical Association 108(503):755–770.10.1080/01621459.2012.734168Search in Google Scholar

Tamminen, K. A., P. Gaudreau, C. E. McEwen, and P. R. Crocker. 2016. “Interpersonal Emotion Regulation Among Adolescent Athletes: A Bayesian Multilevel Model Predicting Sport Enjoyment and Commitment.” Journal of Sport and Exercise Psychology 38:541–555. http://journals.humankinetics.com/doi/10.1123/jsep.2015-0189.10.1123/jsep.2015-0189Search in Google Scholar PubMed

Thomas, A. C. 2006. “The Impact of Puck Possession and Location on Ice Hockey Strategy.” Journal of Quantitative Analysis in Sports 2.10.2202/1559-0410.1007Search in Google Scholar

Thomas, A., B. O’Hara, U. Ligges, and S. Sturtz. 2006. “Making BUGS open.” R News 6:12–17. https://cran.r-project.org/doc/Rnews/.Search in Google Scholar

Thomas, C., G. Fellingham, and P. Vehrs. 2009. “Development of a Notational Analysis System for Selected Soccer Skills of a Women’s College Team.” Measurement in Physical Education and Exercise Science 13:108–121. http://www.tandfonline.com/doi/abs/10.1080/10913670902812770.10.1080/10913670902812770Search in Google Scholar

Thomas, A. C., S. L. Ventura, S. T. Jensen, and S. Ma. 2013. “Competing Process Hazard Function Models for Player Ratings in Ice Hockey.” The Annals of Applied Statistics 7:1497–1524. http://projecteuclid.org/euclid.aoas/1380804804.10.1214/13-AOAS646Search in Google Scholar

Usami, S. 2017. “Bayesian Longitudinal Paired Comparison Model and its Application to Sports Data Using Weighted Likelihood Bootstrap.” Communications in Statistics – Simulation and Computation 46:1974–1990. https://www.tandfonline.com/doi/full/10.1080/03610918.2015.1026989.10.1080/03610918.2015.1026989Search in Google Scholar

Van Renterghem, P., P. Van Eenoo, P.-E. Sottas, M. Saugy, and F. Delbeke. 2011. “A Pilot Study on Subject-Based Comprehensive Steroid Profiling: Novel Biomarkers to Detect Testosterone Misuse in Sports.” Clinical Endocrinology 75:134–140. http://doi.wiley.com/10.1111/j.1365-2265.2011.03992.x.10.1111/j.1365-2265.2011.03992.xSearch in Google Scholar PubMed

Vetter, R. E., H. Yu, and A. K. Foose. 2017. “Effects of Moderators on Physical Training Programs: A Bayesian Approach.” The Journal of Strength & Conditioning Research 31:1868–1878.10.1519/JSC.0000000000001585Search in Google Scholar PubMed

Visser, I. and M. Speekenbrink. 2010. “depmixS4: An R Package for Hidden Markov Models.” Journal of Statistical Software 36:1–21. http://www.jstatsoft.org/v36/i07/.10.18637/jss.v036.i07Search in Google Scholar

Wetzels, R., D. Tutschkow, C. Dolan, S. van der Sluis, G. Dutilh, and E.-J. Wagenmakers. 2016. “A Bayesian Test for the Hot Hand Phenomenon.” Journal of Mathematical Psychology 72:200–209. http://linkinghub.elsevier.com/retrieve/pii/S0022249615000814.10.1016/j.jmp.2015.12.003Search in Google Scholar

Wimmer, V., N. Fenske, P. Pyrka, and L. Fahrmeir. 2011. “Exploring Competition Performance in Decathlon Using Semi-Parametric Latent Variable Models.” Journal of Quantitative Analysis in Sports 7:1–21.10.2202/1559-0410.1307Search in Google Scholar

Yang, T. Y. 2004. “Bayesian Binary Segmentation Procedure for Detecting Streakiness in Sports.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 167:627–637. http://doi.wiley.com/10.1111/j.1467-985X.2004.00484.x.10.1111/j.1467-985X.2004.00484.xSearch in Google Scholar

Yousefi, K. and T. B. Swartz. 2013. “Advanced Putting Metrics in Golf.” Journal of Quantitative Analysis in Sports 9:239–248.10.1515/jqas-2013-0010Search in Google Scholar

Published Online: 2019-06-27

Published in Print: 2019-10-25

Bayesian statistics meets sports: a comprehensive review

Abstract

1 Introduction

2 Materials and methods for the comprehensive review process

3 Results

3.1 Bayesian statistical methods

3.1.1 Bayesian regression (BR)

3.1.2 Accounting for time

3.1.3 Accounting for space and time

3.1.4 Other methods

3.2 Sports

3.2.1 Basketball

3.2.2 Football

3.2.3 Baseball

3.2.4 Other team sports

3.2.5 Other related issues

3.3 Software for Bayesian computation

3.4 Summary of methods and applications

4 Discussion

5 Conclusion

Acknowledgments

Appendix

Summary of methods and applications

References

Journal and Issue

Articles in the same Issue