Robust experimentation in the continuous time bandit problem

Pourbabaee, Farzad

doi:10.1007/s00199-020-01328-3

Robust experimentation in the continuous time bandit problem

Research Article
Published: 26 November 2020

Volume 73, pages 151–181, (2022)
Cite this article

Economic Theory Aims and scope Submit manuscript

Farzad Pourbabaee¹

333 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We study the experimentation dynamics of a decision maker (DM) in a two-armed bandit setup (Bolton and Harris in Econometrica 67(2):349–374, 1999), where the agent holds ambiguous beliefs regarding the distribution of the return process of one arm and is certain about the other one. The DM entertains Multiplier preferences à la Hansen and Sargent (Am. Econ. Rev. 91(2):60–66, 2001), thus we frame the decision making environment as a two-player differential game against nature in continuous time. We characterize the DM’s value function and her optimal experimentation strategy that turns out to follow a cut-off rule with respect to her belief process. The belief threshold for exploring the ambiguous arm is found in closed form and is shown to be increasing with respect to the ambiguity aversion index. We then study the effect of provision of an unambiguous information source about the ambiguous arm. Interestingly, we show that the exploration threshold rises unambiguously as a result of this new information source, thereby leading to more conservatism. This analysis also sheds light on the efficient time to reach for an expert opinion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of experimental research on contests, all-pay auctions and tournaments

Article 06 November 2014

Difference-in-Differences for Policy Evaluation

Discrete Choice Experiments: A Guide to Model Specification, Estimation and Software

Article 03 April 2017

References

Anderson, C.M.: Ambiguity aversion in multi-armed bandit problems. Theory Decis. 72(1), 15–33 (2012)
Article Google Scholar
Bolton, P., Harris, C.: Strategic experimentation. Econometrica 67(2), 349–374 (1999)
Article Google Scholar
Bonatti, A., Hörner, J.: Learning to disagree in a game of experimentation. J. Econ. Theory 169, 234–269 (2017)
Article Google Scholar
Caro, F., Gupta, A.D.: Robust control of the multi-armed bandit problem. Ann. Oper. Res., pp 1–20 (2013)
Cheng, X., Riedel, F.: Optimal stopping under ambiguity in continuous time. Math. Financ. Econ. 7(1), 29–68 (2013)
Article Google Scholar
Crandall, M.G., Evans, L.C., Lions, P.-L.: Some properties of viscosity solutions of Hamilton–Jacobi equations. Trans. Am. Math. Soc. 282(2), 487–502 (1984)
Article Google Scholar
Dixit, A.: The art of smooth pasting. Routledge, Abingdon (2013)
Book Google Scholar
Epstein, L.G., Ji, S.: Optimal learning under robustness and time consistency. Oper. Res., Forthcoming (2019)
Epstein, L.G., Schneider, M.: Recursive multiple-priors. J. Econ. Theory 113(1), 1–31 (2003)
Article Google Scholar
Epstein, L.G., Schneider, M.: Learning under ambiguity. Rev. Econ. Stud. 74(4), 1275–1303 (2007)
Article Google Scholar
Gilboa, I., Schmeidler, D.: Maxmin expected utility with non-unique prior. J. Math. Econ. 18(2), 141–153 (1989)
Article Google Scholar
Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. B (Methodol.), pp 148–177 (1979)
Gozzi, F., Swiech, A., Zhou, X.Y.: A corrected proof of the stochastic verification theorem within the framework of viscosity solutions. SIAM J. Control Optim. 43(6), 2009–2019 (2005)
Article Google Scholar
Gozzi, F., Święch, A., Zhou, X.Y.: Erratum: a corrected proof of the stochastic verification theorem within the framework of viscosity solutions. SIAM J. Control Optim. 48(6), 4177–4179 (2010)
Article Google Scholar
Hansen, L.P., Sargent, T.J.: Robust control and model uncertainty. Am. Econ. Rev. 91(2), 60–66 (2001)
Article Google Scholar
Hansen, L.P., Sargent, T.J.: Robustness and ambiguity in continuous time. J. Econ. Theory 146(3), 1195–1223 (2011)
Article Google Scholar
Hansen, L.P., Sargent, T.J., Turmuhambetova, G., Williams, N.: Robust control and model misspecification. J. Econ. Theory 128(1), 45–90 (2006)
Article Google Scholar
Heidhues, P., Rady, S., Strack, P.: Strategic experimentation with private payoffs. J. Econ. Theory 159, 531–551 (2015)
Article Google Scholar
Karatzas, I., Shreve, S.: Brownian motion and stochastic calculus, vol. 113. Springer, New York (2012)
Google Scholar
Keller, G., Rady, S.: Optimal experimentation in a changing environment. Rev. Econ. Stud. 66(3), 475–507 (1999)
Article Google Scholar
Keller, G., Rady, S., Cripps, M.: Strategic experimentation with exponential bandits. Econometrica 73(1), 39–68 (2005)
Article Google Scholar
Kim, M.J., Lim, A.E.B.: Robust multiarmed bandit problems. Manag. Sci. 62(1), 264–285 (2015)
Google Scholar
Li, J.: The k-armed bandit problem with multiple priors. J. Math. Econ. 80, 22–38 (2019)
Article Google Scholar
Lions, P.L.: Optimal control of diffusion processes and Hamilton–Jacobi-bellman equations part 2: viscosity solutions and uniqueness. Commun. Partial Differ. Equ. 8(11), 1229–1276 (1983)
Article Google Scholar
Liptser, R.S., Shiryaev, A.N.: Statistics of random processes: I. General theory, vol. 5. Springer, New York (2013)
Google Scholar
Luo, Y.: Robustly strategic consumption-portfolio rules with informational frictions. Manag. Sci. 63(12), 4158–4174 (2017)
Article Google Scholar
Maccheroni, F., Marinacci, M., Rustichini, A.: Ambiguity aversion, robustness, and the variational representation of preferences. Econometrica 74(6), 1447–1498 (2006a)
Article Google Scholar
Maccheroni, F., Marinacci, M., Rustichini, A.: Dynamic variational preferences. J. Econ. Theory 128(1), 4–44 (2006b)
Article Google Scholar
Manso, G.: Motivating innovation. J. Finance 66(5), 1823–1860 (2011)
Article Google Scholar
Marinacci, M.: Learning from ambiguous urns. Stat. Papers 43(1), 143–151 (2002)
Article Google Scholar
Meyer, R.J., Shi, Y.: Sequential choice under ambiguity: intuitive solutions to the armed-bandit problem. Manag. Sci. 41(5), 817–834 (1995)
Article Google Scholar
Miao, J., Rivera, A.: Robust contracts in continuous time. Econometrica 84(4), 1405–1440 (2016)
Article Google Scholar
Parthasarathy, K.R.: Probability measures on metric spaces. Am. Math. Soc. 352 (2005)
Polyanin, A.D., Zaitsev, V.F.: Handbook of ordinary differential equations: exact solutions, methods, and problems. Chapman and Hall/CRC, London (2017)
Book Google Scholar
Riedel, F.: Optimal stopping with multiple priors. Econometrica 77(3), 857–908 (2009)
Article Google Scholar
Viefers, P.: Should i stay or should i go? A laboratory analysis of investment opportunities under ambiguity. Working Paper (2012)
Weitzman, M.L.: Optimal search for the best alternative. Econ. J. Econ. Soc. 47(3), 641–654 (1979)
Google Scholar
Yaoyao, W., Yang, J., Zou, Z.: Ambiguity sharing and the lack of relative performance evaluation. Econ. Theory 66(1), 141–157 (2018)
Article Google Scholar
Zhou, X.Y., Yong, J., Li, X.: Stochastic verification theorems within the framework of viscosity solutions. SIAM J. Control Optim. 35(1), 243–253 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, 414 Evans Hall, Berkeley, CA, 94720, USA
Farzad Pourbabaee

Authors

Farzad Pourbabaee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farzad Pourbabaee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

I would like to thank Robert M. Anderson, Philipp Strack, Gustavo Manso and Demian Pouzo for the support and guidance over the course of this paper, and I am grateful to Haluk Ergin, Chris Shannon and David Ahn for the valuable comments and suggestions. All remaining errors are mine.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pourbabaee, F. Robust experimentation in the continuous time bandit problem. Econ Theory 73, 151–181 (2022). https://doi.org/10.1007/s00199-020-01328-3

Download citation

Received: 09 January 2020
Accepted: 09 November 2020
Published: 26 November 2020
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00199-020-01328-3

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust experimentation in the continuous time bandit problem

Abstract

Access this article

Similar content being viewed by others

A survey of experimental research on contests, all-pay auctions and tournaments

Difference-in-Differences for Policy Evaluation

Discrete Choice Experiments: A Guide to Model Specification, Estimation and Software

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Robust experimentation in the continuous time bandit problem

Abstract

Access this article

Similar content being viewed by others

A survey of experimental research on contests, all-pay auctions and tournaments

Difference-in-Differences for Policy Evaluation

Discrete Choice Experiments: A Guide to Model Specification, Estimation and Software

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation