A note on the advantage of context in Thompson sampling

Practice Article
Published: 24 March 2021

Volume 20, pages 316–321, (2021)
Cite this article

Journal of Revenue and Pricing Management Aims and scope

Michael Byrd¹ &
Ross Darrow²

252 Accesses
3 Citations
Explore all metrics

Abstract

Personalization has become a focal point of modern revenue management. However, it is often the case that minimal data are available to appropriately make suggestions tailored to each customer. This has led to many products making use of reinforcement learning-based algorithms to explore sets of offerings to find the best suggestions to improve conversion and revenue. Arguably the most popular of these algorithms are built on the foundation of the multi-arm bandit framework, which has shown great success across a variety of use cases. A general multi-arm bandit algorithm aims to trade-off adaptively exploring available, but under observed, recommendations, with the current known best offering. While much success has been achieved with these relatively understandable procedures, much of the airline industry is losing out on better personalized offers by ignoring the context of the transaction, as is the case in the traditional multi-arm bandit setup. Here, we explore a popular exploration heuristic, Thompson sampling, and note implementation details for multi-arm and contextual bandit variants. While the contextual bandit requires greater computational and technical complexity to include contextual features in the decision process, we illustrate the value it brings by the improvement in overall expected

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

A note on the advantage of context in Thompson sampling

Chapter © 2023

A Thompson Sampling Approach to Unifying Causal Inference and Bandit Learning

Chapter © 2023

Dirichlet–Luce choice model for learning from interactions

Article 04 June 2022

References

Agrawal, S., Goyal, N. 2012. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory (PP. 39–1).
Audibert, J.Y., R. Munos, and C. Szepesvári. 2009. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science 410 (19): 1876–1902.
Article Google Scholar
Chapelle, O., Li, L. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems (pp. 2249–2257).
Choi, H.M., J.C. Román, et al. 2017. Analysis of polya-gamma GIBBS sampler for Bayesian logistic analysis of variance. Electronic Journal of Statistics 11 (1): 326–337.
Article Google Scholar
Dubé, J- P., Misra, S. 2019. Personalized pricing and customer welfare. Chicago Booth School of Business Working Paper.
Dumitrascu, B., Feng, K., Engelhardt, B. 2018. Pg-ts: Improved thompson sampling for logistic contextual bandits. In Advances in neural information processing systems (pp. 4624–4633).
Ferreira, K.J., D. Simchi-Levi, and H. Wang. 2018. Online network revenue management using Thompson sampling. Operations Research 66 (6): 1586–1602.
Article Google Scholar
Garivier, A., Cappé, O. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th annual conference on learning theory (pp. 359–376).
Gelman, A., J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, and D.B. Rubin. 2013. Bayesian data analysis. Boca Raton: CRC Press.
Book Google Scholar
Green, P.E., A.M. Krieger, and Y. Wind. 2001. Thirty years of conjoint analysis: Reflections and prospects. Interfaces 31 (3): S56–S73.
Article Google Scholar
Joulani, P., Gyorgy, A., Szepesvári, C. 2013. Online learning under delayed feedback. In International conference on machine learning (pp. 1453–1461).
Karnin, Z., Koren, T., Somekh, O. 2013. Almost optimal exploration in multi-armed bandits. In International conference on machine learning (pp. 1238–1246).
Kaufmann, E., Korda, N., Munos, R. 2012. Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory (pp. 199–213).
Li, L., Chu, W., Langford, J., Schapire, R. E. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on world wide web (pp. 661–670).
Makalic, E., Schmidt, D. n.d. High-dimensional Bayesian regularised regression with the bayesreg package. arXiv:1611.06649v3
Polson, N.G., J.G. Scott, and J. Windle. 2013. Bayesian inference for logistic models using pólya-gamma latent variables. Journal of the American Statistical Association 108 (504): 1339–1349.
Article Google Scholar
Riquelme, C., Tucker, G., Snoek, J. 2018. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127.
Vinod, B., R. Ratliff, and V. Jayaram. 2018. An approach to offer management: Maximizing sales with fare products and ancillaries. Journal of Revenue and Pricing Management 17 (2): 91–101.
Article Google Scholar
Whittle, P. 1980. Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society 42 (2): 143–149.
Google Scholar

Download references

Author information

Authors and Affiliations

Yum! Brands, 7100 Corporate Drive, Plano, TX, 75024, USA
Michael Byrd
Bedford, TX, 76021, USA
Ross Darrow

Authors

Michael Byrd
View author publications
You can also search for this author in PubMed Google Scholar
Ross Darrow
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Byrd.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Byrd, M., Darrow, R. A note on the advantage of context in Thompson sampling. J Revenue Pricing Manag 20, 316–321 (2021). https://doi.org/10.1057/s41272-021-00314-1

Download citation

Received: 12 May 2020
Accepted: 08 September 2020
Published: 24 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1057/s41272-021-00314-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions