Abstract
With an estimated market size of nearly $18 billion in 2016, casual games (games played over social networks or mobile devices) have become increasingly popular. Because most casual games are free to install, understanding repeat playing behavior is important for game developers as it directly drives advertising revenue. Game developers are keenly interested in benchmarking their game versus the market average, and understanding how genre and various game mechanics drive repeat playing behavior. Such cross-sectional analysis, however, is difficult to conduct because individual-level data on competitors’ games are not publicly available, and that the casual gaming industry is highly fragmented with each firm making only a handful of games.
I develop a Bayesian approach, based on a parsimonious Hidden Markov Model at the individual level in conjunction with data augmentation, to study repeat playing behavior using only publicly available data. After applying the proposed approach to a sample of 379 casual games, I find that the average daily attrition rate across game is around 36.5%, with an average “play” rate of 47.9%, resulting in an average ARPU (average revenue per user) across games of around 20.5 cents. Certain genres are linked to higher attrition rates and play rates. In addition, giving out a “daily bonus” or limiting the amount of time that gamers can play each day are associated with a 17.7% and 16.4% higher ARPU, respectively.
Similar content being viewed by others
Notes
In the context of causal game playing behavior, the transition from “Death” back to “Active” is extremely unlikely. Private communication with data scientists at several major game developers reveal that once a user did not log in for 30 consecutive days, his/her probability of returning at least once over the next 30 days is lower than 1%. Thus, the probability of moving from “Death” to “Active” is negligible.
Consider M = 100 million (as discussed in the introduction, the game CityVille has an MAU of 100 million, which sets a lower bound for its potential number of players M i ). Augmenting a year of play history of each player would require simulating 2 * 365 * 100 million =7.3 × 1010 entries. Assuming that each entry takes only 1 byte of memory to store, the two latent matrices alone would require 8000 terabytes of memory to store and process.
According to Appdata (the data provider), there are some minor non-specific measurement errors in the DAU and MAU data. For instance, in several cases I notice that the DAU and MAU values on the release date of a game are not the same (when they should be equal by definition), which Appdata ascribes to server-side recording error. Such measurement errors are assumed to occur at random.
Robustness checks on other values of σ d and σ m are conducted by setting their values to .025 and .1 instead of .05 here; the key results are the substantively unchanged.
For instance, if individual-level adoption probabilities are negatively correlated with attrition probabilities (i.e., early adopters are less likely to churn), later cohorts would have a higher average attrition probability than earlier cohorts, thus introducing model misspecification error since the model assumes that average attrition probability is constant across cohorts.
I have also conducted another set of simulation studies where M is set to be 1,000,000. The results are substantially unchanged and are available upon request.
Another possibility is to directly build these covariates into the model and estimate their effect jointly when estimating the model. However, this is computationally intractable because the games can no longer be estimated in parallel.
References
Barnes, D. (2010). DAU/MAU crash course – your measure of game design quality, available at web.archive.org/web/20130530064350/http://fbindie.posterous.com/daumau-crash-course-the-main-measure-of-game.
Bolton, R., Kannan, P. K., & Bramlett, M. (2000). Implications of loyalty program membership and service experiences for customer retention. Journal of the Academy of Marketing Science, 28(1), 95–108.
Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.
Chen, Y., & Yang, S. (2007). Estimating disaggregate models using aggregate data through augmentation of individual choice. Journal of Marketing Research, 44(Nov), 613–621.
De Vere, K. (2011). 46 cents in revenue per daily active user? available at http://www.insidemobileapps.com/2011/11/16/a-thinking-ape-interview-kenshi-arasaki/.
Fader, P. S., Hardie, B. G. S., & Shang, J. (2010). Customer-Base analysis in a discrete-time Noncontractual setting. Marketing Science, 29(6), 1086–1108.
Galak, J., Kruger, J., & Loewenstein, G. (2011). Is variety the spice of life? It all depends on the rate of consumption. Judgment and Decision making, 6(3), 230–238.
Galak, J., Kruger, J., & Loewenstein, G. (2013). Slow down! Insensitivity to rate of consumption leads to avoidable satiation. Journal of Consumer Research, 39(5), 993–1009.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin (2013), Bayesian Data Analysis, 3rd Edition, Chapman & Hall, Boca Raton, FL.
Gupta, S., & Zeithaml, V. (2006). Customer metrics and their impact on financial performance. Marketing Science, 25(6), 718–739.
Jiang, R., Manchanda, P., & Rossi, P. E. (2009). Bayesian analysis of random coefficient logit models using aggregate data. Journal of Econometrics, 149, 136–148.
McCalmont, T. (2015). 15 metrics all game developers should know by heart, available at http://www.gameanalytics.com/blog/metrics-all-game-developers-should-know.html.
Montgomery, A., Li, S., Srinivasan, K., & Liechty, J. (2004). Modeling online browsing and path analysis using clickstream data. Marketing Science, 23(4), 579–595.
Musalem, A., Bradlow, E. T., & Raju, J. S. (2008). Who’s got the coupon? Estimating consumer preferences and coupon usage from aggregate information. Journal of Marketing Research, 45(Dec), 715–730.
Musalem, A., Bradlow, E. T., & Raju, J. S. (2009). Bayesian estimation of random-coefficients choice models using aggregate data. Journal of Applied Econometrics, 24, 490–516.
Nelson, L. D., & Meyvis, T. (2008). Interrupted consumption: disrupting adaptation to hedonic experiences. Journal of Marketing Research, 45(December), 654–664.
Nelson, L. D., Meyvis, T., & Galak, J. (2009). Enhancing the television viewing experience through commercial interruptions. Journal of Consumer Research, 36(August), 160–172.
Netzer, O., Lattin, J. M., & Srinivasan, V. (2008). A hidden Markov model of customer relationship dynamics. Marketing Science, 27(2), 185–204.
Park, S.-H., & Gupta, S. (2009). Simulated maximum likelihood estimator for the random coefficient logit model using aggregate data. Journal of Marketing Research, 46(4), 531–542.
Playnomics (2012). Playnomics Quarterly U.S. Player Engagement Study, Q3, 2012, available at http://www.adweek.com/core/wp-content/uploads/sites/2/2012/10/Playnomics_Q3-report_Final-copy.pdf. Accessed 13 Feb 2017.
Redden, J. P. (2008). Reducing satiation: the role of categorization level. Journal of Consumer Research, 34(Feb), 624–634.
Redden, J. P., & Galak, J. (2013). The subjective sense of feeling satiated. Journal of Experimental Psychology: General, 142(1), 209–217.
PopCap Research (2011). 2011 popcap games social gaming research, available at www.infosolutionsgroup.com/pdfs/2011_PopCap_Social_Gaming_Research_Results.pdf.
Robert, C., & Casella, G. (2004). Monte Carlo statistical methods. New York: Springer.
Sapolsky, R., & Reynolds, M. (2006). Zebras and lions in the workplace: an interview with Dr. Robert Sapolsky. International Journal of Coaching in Organization, 4(2), 7–15.
Stark, H. (2010). Facebook DAU and MAU: what they tell you (and what they don’t),” available at http://insightanalysis.wordpress.com/2010/07/21/facebook-dau-and-mau-what-they-tell-you-and-what-they-dont/.
Takahashi, D. (2016). PwC: game industry to grow nearly 5% annually through 2010,available at http://venturebeat.com/2016/06/08/the-u-s-and-global-game-industries-will-grow-a-healthy-amount-by-2020-pwc-forecasts/.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398), 528–540.
Terwitte, C. (2015). Engagement benchmarks deep-dive: a detailed look at games verticals, available at https://www.adjust.com/mobile-benchmarks-q3-2015/games-verticals/.
Winkler, R. (2011). Testing the durability of Zynga’s virtual business, The Wall Street Journal, 9/28/2011, available at https://www.wsj.com/articles/SB10001424052970204422404576597070568208288. Accessed 13 Feb 2017.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 I. MCMC computational procedure
I describe the MCMC procedure used to calibrate the model. In each iteration, I draw from the full conditional distributions of model parameters in the following order:
-
\( \left({S}_{(i)},{Y}_{(i)}\right),{\pi}_{i t},{\theta}_{i j},{\phi}_{i j},\left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right),\left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \). An independent Metropolis-Hasting algorithm is used to sample each row of (S (i), Y (i)); given (S (i), Y (i)) , π it , θ ij , ϕ ij , are drawn using a Gibbs sampler, and the hyperparameters \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \), and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \) are drawn using a random-walk Metropolis-Hastings algorithm (Gelman et al. 2013). Each step is outlined as follows.
-
1)
Drawing each row of (S (i), Y (i)):
-
For each representative gamer j (i.e., the j-th row in (S (i), Y (i))), I simulate a “proposed play history” (including both the time series of her play history and state transitions) using the HMM specified in Eq. [1]-[5] with the current draw of θ ij and ϕ ij . Then, I compute the likelihood of the proposed play history given Eq. [8] and [9], and accept or reject the new draw based on the Metropolis-Hastings acceptance probability (Gelman et al. 2013).
-
2)
Drawing π it :
-
Denote the number of gamers (in the representative sample of size R) who are in the “Unaware” state at the start of the t-period by \( {n}_{it}^{(U)} \), and denote the number of transitions from the “Unaware” state to the “Active” state during the t-th period by \( {n}_{it}^{\left( U\to A\right)} \). Then, π it can be sampled from a \( Beta\left({a}_i^{\left(\pi \right)}+{n}_{i t}^{\left( U\to A\right)},{b}_i^{\left(\pi \right)}+{n}_{i t}^{(U)}-{n}_{i t}^{\left( U\to A\right)}\right) \) distribution.
-
3)
Drawing θ ij :
-
For gamer j, denote her total number of “Active”➔“Active” transitions by \( {n}_{ij}^{\left( A\to A\right)} \), and denote her total number of “Active”➔“Dead” transitions by \( {n}_{ij}^{\left( A\to D\right)} \) (by definition, \( {n}_{ij}^{\left( A\to D\right)} \) takes either the value of 0 or 1 since “Dead” is an absorbing state). Then, θ ij can be sampled from a \( Beta\left({a}_i^{\left(\theta \right)}+{n}_{i j}^{\left( A\to D\right)},{b}_i^{\left(\theta \right)}+{n}_{i j}^{\left( A\to A\right)}\right) \) distribution.
-
4)
Drawing ϕ ij :
-
Denote the number of days that gamer j stays in the “Active” state (excluding the first day when she first becomes “Active”) by \( {n}_{ij}^{(A)} \), and denote the number of days that gamer j plays the game (excluding the first day) by \( {n}_{ij}^{(P)} \). Then, ϕ ij can be sampled from a \( Beta\left({a}_i^{\left(\phi \right)}+{n}_{i j}^{(P)},{b}_i^{\left(\phi \right)}+{n}_{i j}^{(A)}-{n}_{i j}^{(P)}\right) \) distribution.
-
5)
Drawing \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \), and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \):
-
Because standard conjugate computations are not available to sample the hyperparameters \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right),\left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \), and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \), I use a random walk Metropolis-Hastings algorithm to sample from their posterior distributions. To sample \( \left({a}_i^{\left(\pi \right)},{b}_i^{\left(\pi \right)}\right) \), I use a bivariate, independent Gaussian random walk proposal distribution with the mean centered on the value of the previous draw; negative draws are “reflected” off the origin. The variance of the proposal distribution is adjusted to achieve an acceptance rate close to 50% (Gelman et al. 2013). Similarly, the hyperparameters \( \left({a}_i^{\left(\theta \right)},{b}_i^{\left(\theta \right)}\right) \) and \( \left({a}_i^{\left(\phi \right)},{b}_i^{\left(\phi \right)}\right) \) can be sampled using an analogous procedure.
Rights and permissions
About this article
Cite this article
Hui, S.K. Understanding repeat playing behavior in casual games using a Bayesian data augmentation approach. Quant Mark Econ 15, 29–55 (2017). https://doi.org/10.1007/s11129-017-9180-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11129-017-9180-2