Skip to main content
Log in

Understanding repeat playing behavior in casual games using a Bayesian data augmentation approach

  • Published:
Quantitative Marketing and Economics Aims and scope Submit manuscript

Abstract

With an estimated market size of nearly $18 billion in 2016, casual games (games played over social networks or mobile devices) have become increasingly popular. Because most casual games are free to install, understanding repeat playing behavior is important for game developers as it directly drives advertising revenue. Game developers are keenly interested in benchmarking their game versus the market average, and understanding how genre and various game mechanics drive repeat playing behavior. Such cross-sectional analysis, however, is difficult to conduct because individual-level data on competitors’ games are not publicly available, and that the casual gaming industry is highly fragmented with each firm making only a handful of games.

I develop a Bayesian approach, based on a parsimonious Hidden Markov Model at the individual level in conjunction with data augmentation, to study repeat playing behavior using only publicly available data. After applying the proposed approach to a sample of 379 casual games, I find that the average daily attrition rate across game is around 36.5%, with an average “play” rate of 47.9%, resulting in an average ARPU (average revenue per user) across games of around 20.5 cents. Certain genres are linked to higher attrition rates and play rates. In addition, giving out a “daily bonus” or limiting the amount of time that gamers can play each day are associated with a 17.7% and 16.4% higher ARPU, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In the context of causal game playing behavior, the transition from “Death” back to “Active” is extremely unlikely. Private communication with data scientists at several major game developers reveal that once a user did not log in for 30 consecutive days, his/her probability of returning at least once over the next 30 days is lower than 1%. Thus, the probability of moving from “Death” to “Active” is negligible.

  2. Consider M = 100 million (as discussed in the introduction, the game CityVille has an MAU of 100 million, which sets a lower bound for its potential number of players M i ). Augmenting a year of play history of each player would require simulating 2 * 365 * 100 million =7.3 × 1010 entries. Assuming that each entry takes only 1 byte of memory to store, the two latent matrices alone would require 8000 terabytes of memory to store and process.

  3. According to Appdata (the data provider), there are some minor non-specific measurement errors in the DAU and MAU data. For instance, in several cases I notice that the DAU and MAU values on the release date of a game are not the same (when they should be equal by definition), which Appdata ascribes to server-side recording error. Such measurement errors are assumed to occur at random.

  4. Robustness checks on other values of σ d and σ m are conducted by setting their values to .025 and .1 instead of .05 here; the key results are the substantively unchanged.

  5. For instance, if individual-level adoption probabilities are negatively correlated with attrition probabilities (i.e., early adopters are less likely to churn), later cohorts would have a higher average attrition probability than earlier cohorts, thus introducing model misspecification error since the model assumes that average attrition probability is constant across cohorts.

  6. I have also conducted another set of simulation studies where M is set to be 1,000,000. The results are substantially unchanged and are available upon request.

  7. See, e.g., https://twitter.com/sequoia/status/436302641992187904

  8. Another possibility is to directly build these covariates into the model and estimate their effect jointly when estimating the model. However, this is computationally intractable because the games can no longer be estimated in parallel.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sam K. Hui.

Appendix

Appendix

1.1 I. MCMC computational procedure

I describe the MCMC procedure used to calibrate the model. In each iteration, I draw from the full conditional distributions of model parameters in the following order:

  • \( \left({S}_{(i)},{Y}_{(i)}\right),{\pi}_{i t},{\theta}_{i j},{\phi}_{i j},\left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right),\left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \). An independent Metropolis-Hasting algorithm is used to sample each row of (S (i), Y (i)); given (S (i), Y (i)) , π it  , θ ij  , ϕ ij , are drawn using a Gibbs sampler, and the hyperparameters \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \), and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \) are drawn using a random-walk Metropolis-Hastings algorithm (Gelman et al. 2013). Each step is outlined as follows.

  1. 1)

    Drawing each row of (S (i), Y (i)):

  • For each representative gamer j (i.e., the j-th row in (S (i), Y (i))), I simulate a “proposed play history” (including both the time series of her play history and state transitions) using the HMM specified in Eq. [1]-[5] with the current draw of θ ij and ϕ ij . Then, I compute the likelihood of the proposed play history given Eq. [8] and [9], and accept or reject the new draw based on the Metropolis-Hastings acceptance probability (Gelman et al. 2013).

  1. 2)

    Drawing π it :

  • Denote the number of gamers (in the representative sample of size R) who are in the “Unaware” state at the start of the t-period by \( {n}_{it}^{(U)} \), and denote the number of transitions from the “Unaware” state to the “Active” state during the t-th period by \( {n}_{it}^{\left( U\to A\right)} \). Then, π it can be sampled from a \( Beta\left({a}_i^{\left(\pi \right)}+{n}_{i t}^{\left( U\to A\right)},{b}_i^{\left(\pi \right)}+{n}_{i t}^{(U)}-{n}_{i t}^{\left( U\to A\right)}\right) \) distribution.

  1. 3)

    Drawing θ ij :

  • For gamer j, denote her total number of “Active”➔“Active” transitions by \( {n}_{ij}^{\left( A\to A\right)} \), and denote her total number of “Active”➔“Dead” transitions by \( {n}_{ij}^{\left( A\to D\right)} \) (by definition, \( {n}_{ij}^{\left( A\to D\right)} \) takes either the value of 0 or 1 since “Dead” is an absorbing state). Then, θ ij can be sampled from a \( Beta\left({a}_i^{\left(\theta \right)}+{n}_{i j}^{\left( A\to D\right)},{b}_i^{\left(\theta \right)}+{n}_{i j}^{\left( A\to A\right)}\right) \) distribution.

  1. 4)

    Drawing ϕ ij :

  • Denote the number of days that gamer j stays in the “Active” state (excluding the first day when she first becomes “Active”) by \( {n}_{ij}^{(A)} \), and denote the number of days that gamer j plays the game (excluding the first day) by \( {n}_{ij}^{(P)} \). Then, ϕ ij can be sampled from a \( Beta\left({a}_i^{\left(\phi \right)}+{n}_{i j}^{(P)},{b}_i^{\left(\phi \right)}+{n}_{i j}^{(A)}-{n}_{i j}^{(P)}\right) \) distribution.

  1. 5)

    Drawing \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \), and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \):

  • Because standard conjugate computations are not available to sample the hyperparameters \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \), and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \), I use a random walk Metropolis-Hastings algorithm to sample from their posterior distributions. To sample \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right) \), I use a bivariate, independent Gaussian random walk proposal distribution with the mean centered on the value of the previous draw; negative draws are “reflected” off the origin. The variance of the proposal distribution is adjusted to achieve an acceptance rate close to 50% (Gelman et al. 2013). Similarly, the hyperparameters \( \left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \) and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \) can be sampled using an analogous procedure.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hui, S.K. Understanding repeat playing behavior in casual games using a Bayesian data augmentation approach. Quant Mark Econ 15, 29–55 (2017). https://doi.org/10.1007/s11129-017-9180-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11129-017-9180-2

Keywords

Navigation