Skip to main content
Log in

A note on the advantage of context in Thompson sampling

  • Practice Article
  • Published:
Journal of Revenue and Pricing Management Aims and scope

Abstract

Personalization has become a focal point of modern revenue management. However, it is often the case that minimal data are available to appropriately make suggestions tailored to each customer. This has led to many products making use of reinforcement learning-based algorithms to explore sets of offerings to find the best suggestions to improve conversion and revenue. Arguably the most popular of these algorithms are built on the foundation of the multi-arm bandit framework, which has shown great success across a variety of use cases. A general multi-arm bandit algorithm aims to trade-off adaptively exploring available, but under observed, recommendations, with the current known best offering. While much success has been achieved with these relatively understandable procedures, much of the airline industry is losing out on better personalized offers by ignoring the context of the transaction, as is the case in the traditional multi-arm bandit setup. Here, we explore a popular exploration heuristic, Thompson sampling, and note implementation details for multi-arm and contextual bandit variants. While the contextual bandit requires greater computational and technical complexity to include contextual features in the decision process, we illustrate the value it brings by the improvement in overall expected

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Agrawal, S., Goyal, N. 2012. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory (PP. 39–1).

  • Audibert, J.Y., R. Munos, and C. Szepesvári. 2009. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science 410 (19): 1876–1902.

    Article  Google Scholar 

  • Chapelle, O., Li, L. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems (pp. 2249–2257).

  • Choi, H.M., J.C. Román, et al. 2017. Analysis of polya-gamma GIBBS sampler for Bayesian logistic analysis of variance. Electronic Journal of Statistics 11 (1): 326–337.

    Article  Google Scholar 

  • Dubé, J- P., Misra, S. 2019. Personalized pricing and customer welfare. Chicago Booth School of Business Working Paper.

  • Dumitrascu, B., Feng, K., Engelhardt, B. 2018. Pg-ts: Improved thompson sampling for logistic contextual bandits. In Advances in neural information processing systems (pp. 4624–4633).

  • Ferreira, K.J., D. Simchi-Levi, and H. Wang. 2018. Online network revenue management using Thompson sampling. Operations Research 66 (6): 1586–1602.

    Article  Google Scholar 

  • Garivier, A., Cappé, O. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th annual conference on learning theory (pp. 359–376).

  • Gelman, A., J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, and D.B. Rubin. 2013. Bayesian data analysis. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Green, P.E., A.M. Krieger, and Y. Wind. 2001. Thirty years of conjoint analysis: Reflections and prospects. Interfaces 31 (3): S56–S73.

    Article  Google Scholar 

  • Joulani, P., Gyorgy, A., Szepesvári, C. 2013. Online learning under delayed feedback. In International conference on machine learning (pp. 1453–1461).

  • Karnin, Z., Koren, T., Somekh, O. 2013. Almost optimal exploration in multi-armed bandits. In International conference on machine learning (pp. 1238–1246).

  • Kaufmann, E., Korda, N., Munos, R. 2012. Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory (pp. 199–213).

  • Li, L., Chu, W., Langford, J., Schapire, R. E. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on world wide web (pp. 661–670).

  • Makalic, E., Schmidt, D. n.d. High-dimensional Bayesian regularised regression with the bayesreg package. arXiv:1611.06649v3

  • Polson, N.G., J.G. Scott, and J. Windle. 2013. Bayesian inference for logistic models using pólya-gamma latent variables. Journal of the American Statistical Association 108 (504): 1339–1349.

    Article  Google Scholar 

  • Riquelme, C., Tucker, G., Snoek, J. 2018. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127.

  • Vinod, B., R. Ratliff, and V. Jayaram. 2018. An approach to offer management: Maximizing sales with fare products and ancillaries. Journal of Revenue and Pricing Management 17 (2): 91–101.

    Article  Google Scholar 

  • Whittle, P. 1980. Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society 42 (2): 143–149.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Byrd.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Byrd, M., Darrow, R. A note on the advantage of context in Thompson sampling. J Revenue Pricing Manag 20, 316–321 (2021). https://doi.org/10.1057/s41272-021-00314-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s41272-021-00314-1

Keywords

Navigation