Skip to main content
Log in

Robust experimentation in the continuous time bandit problem

  • Research Article
  • Published:
Economic Theory Aims and scope Submit manuscript

Abstract

We study the experimentation dynamics of a decision maker (DM) in a two-armed bandit setup (Bolton and Harris in Econometrica 67(2):349–374, 1999), where the agent holds ambiguous beliefs regarding the distribution of the return process of one arm and is certain about the other one. The DM entertains Multiplier preferences à la Hansen and Sargent (Am. Econ. Rev. 91(2):60–66, 2001), thus we frame the decision making environment as a two-player differential game against nature in continuous time. We characterize the DM’s value function and her optimal experimentation strategy that turns out to follow a cut-off rule with respect to her belief process. The belief threshold for exploring the ambiguous arm is found in closed form and is shown to be increasing with respect to the ambiguity aversion index. We then study the effect of provision of an unambiguous information source about the ambiguous arm. Interestingly, we show that the exploration threshold rises unambiguously as a result of this new information source, thereby leading to more conservatism. This analysis also sheds light on the efficient time to reach for an expert opinion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, C.M.: Ambiguity aversion in multi-armed bandit problems. Theory Decis. 72(1), 15–33 (2012)

    Article  Google Scholar 

  • Bolton, P., Harris, C.: Strategic experimentation. Econometrica 67(2), 349–374 (1999)

    Article  Google Scholar 

  • Bonatti, A., Hörner, J.: Learning to disagree in a game of experimentation. J. Econ. Theory 169, 234–269 (2017)

    Article  Google Scholar 

  • Caro, F., Gupta, A.D.: Robust control of the multi-armed bandit problem. Ann. Oper. Res., pp 1–20 (2013)

  • Cheng, X., Riedel, F.: Optimal stopping under ambiguity in continuous time. Math. Financ. Econ. 7(1), 29–68 (2013)

    Article  Google Scholar 

  • Crandall, M.G., Evans, L.C., Lions, P.-L.: Some properties of viscosity solutions of Hamilton–Jacobi equations. Trans. Am. Math. Soc. 282(2), 487–502 (1984)

    Article  Google Scholar 

  • Dixit, A.: The art of smooth pasting. Routledge, Abingdon (2013)

    Book  Google Scholar 

  • Epstein, L.G., Ji, S.: Optimal learning under robustness and time consistency. Oper. Res., Forthcoming (2019)

  • Epstein, L.G., Schneider, M.: Recursive multiple-priors. J. Econ. Theory 113(1), 1–31 (2003)

    Article  Google Scholar 

  • Epstein, L.G., Schneider, M.: Learning under ambiguity. Rev. Econ. Stud. 74(4), 1275–1303 (2007)

    Article  Google Scholar 

  • Gilboa, I., Schmeidler, D.: Maxmin expected utility with non-unique prior. J. Math. Econ. 18(2), 141–153 (1989)

    Article  Google Scholar 

  • Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. B (Methodol.), pp 148–177 (1979)

  • Gozzi, F., Swiech, A., Zhou, X.Y.: A corrected proof of the stochastic verification theorem within the framework of viscosity solutions. SIAM J. Control Optim. 43(6), 2009–2019 (2005)

    Article  Google Scholar 

  • Gozzi, F., Święch, A., Zhou, X.Y.: Erratum: a corrected proof of the stochastic verification theorem within the framework of viscosity solutions. SIAM J. Control Optim. 48(6), 4177–4179 (2010)

    Article  Google Scholar 

  • Hansen, L.P., Sargent, T.J.: Robust control and model uncertainty. Am. Econ. Rev. 91(2), 60–66 (2001)

    Article  Google Scholar 

  • Hansen, L.P., Sargent, T.J.: Robustness and ambiguity in continuous time. J. Econ. Theory 146(3), 1195–1223 (2011)

    Article  Google Scholar 

  • Hansen, L.P., Sargent, T.J., Turmuhambetova, G., Williams, N.: Robust control and model misspecification. J. Econ. Theory 128(1), 45–90 (2006)

    Article  Google Scholar 

  • Heidhues, P., Rady, S., Strack, P.: Strategic experimentation with private payoffs. J. Econ. Theory 159, 531–551 (2015)

    Article  Google Scholar 

  • Karatzas, I., Shreve, S.: Brownian motion and stochastic calculus, vol. 113. Springer, New York (2012)

    Google Scholar 

  • Keller, G., Rady, S.: Optimal experimentation in a changing environment. Rev. Econ. Stud. 66(3), 475–507 (1999)

    Article  Google Scholar 

  • Keller, G., Rady, S., Cripps, M.: Strategic experimentation with exponential bandits. Econometrica 73(1), 39–68 (2005)

    Article  Google Scholar 

  • Kim, M.J., Lim, A.E.B.: Robust multiarmed bandit problems. Manag. Sci. 62(1), 264–285 (2015)

    Google Scholar 

  • Li, J.: The k-armed bandit problem with multiple priors. J. Math. Econ. 80, 22–38 (2019)

    Article  Google Scholar 

  • Lions, P.L.: Optimal control of diffusion processes and Hamilton–Jacobi-bellman equations part 2: viscosity solutions and uniqueness. Commun. Partial Differ. Equ. 8(11), 1229–1276 (1983)

    Article  Google Scholar 

  • Liptser, R.S., Shiryaev, A.N.: Statistics of random processes: I. General theory, vol. 5. Springer, New York (2013)

    Google Scholar 

  • Luo, Y.: Robustly strategic consumption-portfolio rules with informational frictions. Manag. Sci. 63(12), 4158–4174 (2017)

    Article  Google Scholar 

  • Maccheroni, F., Marinacci, M., Rustichini, A.: Ambiguity aversion, robustness, and the variational representation of preferences. Econometrica 74(6), 1447–1498 (2006a)

    Article  Google Scholar 

  • Maccheroni, F., Marinacci, M., Rustichini, A.: Dynamic variational preferences. J. Econ. Theory 128(1), 4–44 (2006b)

    Article  Google Scholar 

  • Manso, G.: Motivating innovation. J. Finance 66(5), 1823–1860 (2011)

    Article  Google Scholar 

  • Marinacci, M.: Learning from ambiguous urns. Stat. Papers 43(1), 143–151 (2002)

    Article  Google Scholar 

  • Meyer, R.J., Shi, Y.: Sequential choice under ambiguity: intuitive solutions to the armed-bandit problem. Manag. Sci. 41(5), 817–834 (1995)

    Article  Google Scholar 

  • Miao, J., Rivera, A.: Robust contracts in continuous time. Econometrica 84(4), 1405–1440 (2016)

    Article  Google Scholar 

  • Parthasarathy, K.R.: Probability measures on metric spaces. Am. Math. Soc. 352 (2005)

  • Polyanin, A.D., Zaitsev, V.F.: Handbook of ordinary differential equations: exact solutions, methods, and problems. Chapman and Hall/CRC, London (2017)

    Book  Google Scholar 

  • Riedel, F.: Optimal stopping with multiple priors. Econometrica 77(3), 857–908 (2009)

    Article  Google Scholar 

  • Viefers, P.: Should i stay or should i go? A laboratory analysis of investment opportunities under ambiguity. Working Paper (2012)

  • Weitzman, M.L.: Optimal search for the best alternative. Econ. J. Econ. Soc. 47(3), 641–654 (1979)

    Google Scholar 

  • Yaoyao, W., Yang, J., Zou, Z.: Ambiguity sharing and the lack of relative performance evaluation. Econ. Theory 66(1), 141–157 (2018)

    Article  Google Scholar 

  • Zhou, X.Y., Yong, J., Li, X.: Stochastic verification theorems within the framework of viscosity solutions. SIAM J. Control Optim. 35(1), 243–253 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farzad Pourbabaee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

I would like to thank Robert M. Anderson, Philipp Strack, Gustavo Manso and Demian Pouzo for the support and guidance over the course of this paper, and I am grateful to Haluk Ergin, Chris Shannon and David Ahn for the valuable comments and suggestions. All remaining errors are mine.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pourbabaee, F. Robust experimentation in the continuous time bandit problem. Econ Theory 73, 151–181 (2022). https://doi.org/10.1007/s00199-020-01328-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00199-020-01328-3

Keywords

JEL Classification

Navigation