Abstract
We study the experimentation dynamics of a decision maker (DM) in a two-armed bandit setup (Bolton and Harris in Econometrica 67(2):349–374, 1999), where the agent holds ambiguous beliefs regarding the distribution of the return process of one arm and is certain about the other one. The DM entertains Multiplier preferences à la Hansen and Sargent (Am. Econ. Rev. 91(2):60–66, 2001), thus we frame the decision making environment as a two-player differential game against nature in continuous time. We characterize the DM’s value function and her optimal experimentation strategy that turns out to follow a cut-off rule with respect to her belief process. The belief threshold for exploring the ambiguous arm is found in closed form and is shown to be increasing with respect to the ambiguity aversion index. We then study the effect of provision of an unambiguous information source about the ambiguous arm. Interestingly, we show that the exploration threshold rises unambiguously as a result of this new information source, thereby leading to more conservatism. This analysis also sheds light on the efficient time to reach for an expert opinion.
Similar content being viewed by others
References
Anderson, C.M.: Ambiguity aversion in multi-armed bandit problems. Theory Decis. 72(1), 15–33 (2012)
Bolton, P., Harris, C.: Strategic experimentation. Econometrica 67(2), 349–374 (1999)
Bonatti, A., Hörner, J.: Learning to disagree in a game of experimentation. J. Econ. Theory 169, 234–269 (2017)
Caro, F., Gupta, A.D.: Robust control of the multi-armed bandit problem. Ann. Oper. Res., pp 1–20 (2013)
Cheng, X., Riedel, F.: Optimal stopping under ambiguity in continuous time. Math. Financ. Econ. 7(1), 29–68 (2013)
Crandall, M.G., Evans, L.C., Lions, P.-L.: Some properties of viscosity solutions of Hamilton–Jacobi equations. Trans. Am. Math. Soc. 282(2), 487–502 (1984)
Dixit, A.: The art of smooth pasting. Routledge, Abingdon (2013)
Epstein, L.G., Ji, S.: Optimal learning under robustness and time consistency. Oper. Res., Forthcoming (2019)
Epstein, L.G., Schneider, M.: Recursive multiple-priors. J. Econ. Theory 113(1), 1–31 (2003)
Epstein, L.G., Schneider, M.: Learning under ambiguity. Rev. Econ. Stud. 74(4), 1275–1303 (2007)
Gilboa, I., Schmeidler, D.: Maxmin expected utility with non-unique prior. J. Math. Econ. 18(2), 141–153 (1989)
Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. B (Methodol.), pp 148–177 (1979)
Gozzi, F., Swiech, A., Zhou, X.Y.: A corrected proof of the stochastic verification theorem within the framework of viscosity solutions. SIAM J. Control Optim. 43(6), 2009–2019 (2005)
Gozzi, F., Święch, A., Zhou, X.Y.: Erratum: a corrected proof of the stochastic verification theorem within the framework of viscosity solutions. SIAM J. Control Optim. 48(6), 4177–4179 (2010)
Hansen, L.P., Sargent, T.J.: Robust control and model uncertainty. Am. Econ. Rev. 91(2), 60–66 (2001)
Hansen, L.P., Sargent, T.J.: Robustness and ambiguity in continuous time. J. Econ. Theory 146(3), 1195–1223 (2011)
Hansen, L.P., Sargent, T.J., Turmuhambetova, G., Williams, N.: Robust control and model misspecification. J. Econ. Theory 128(1), 45–90 (2006)
Heidhues, P., Rady, S., Strack, P.: Strategic experimentation with private payoffs. J. Econ. Theory 159, 531–551 (2015)
Karatzas, I., Shreve, S.: Brownian motion and stochastic calculus, vol. 113. Springer, New York (2012)
Keller, G., Rady, S.: Optimal experimentation in a changing environment. Rev. Econ. Stud. 66(3), 475–507 (1999)
Keller, G., Rady, S., Cripps, M.: Strategic experimentation with exponential bandits. Econometrica 73(1), 39–68 (2005)
Kim, M.J., Lim, A.E.B.: Robust multiarmed bandit problems. Manag. Sci. 62(1), 264–285 (2015)
Li, J.: The k-armed bandit problem with multiple priors. J. Math. Econ. 80, 22–38 (2019)
Lions, P.L.: Optimal control of diffusion processes and Hamilton–Jacobi-bellman equations part 2: viscosity solutions and uniqueness. Commun. Partial Differ. Equ. 8(11), 1229–1276 (1983)
Liptser, R.S., Shiryaev, A.N.: Statistics of random processes: I. General theory, vol. 5. Springer, New York (2013)
Luo, Y.: Robustly strategic consumption-portfolio rules with informational frictions. Manag. Sci. 63(12), 4158–4174 (2017)
Maccheroni, F., Marinacci, M., Rustichini, A.: Ambiguity aversion, robustness, and the variational representation of preferences. Econometrica 74(6), 1447–1498 (2006a)
Maccheroni, F., Marinacci, M., Rustichini, A.: Dynamic variational preferences. J. Econ. Theory 128(1), 4–44 (2006b)
Manso, G.: Motivating innovation. J. Finance 66(5), 1823–1860 (2011)
Marinacci, M.: Learning from ambiguous urns. Stat. Papers 43(1), 143–151 (2002)
Meyer, R.J., Shi, Y.: Sequential choice under ambiguity: intuitive solutions to the armed-bandit problem. Manag. Sci. 41(5), 817–834 (1995)
Miao, J., Rivera, A.: Robust contracts in continuous time. Econometrica 84(4), 1405–1440 (2016)
Parthasarathy, K.R.: Probability measures on metric spaces. Am. Math. Soc. 352 (2005)
Polyanin, A.D., Zaitsev, V.F.: Handbook of ordinary differential equations: exact solutions, methods, and problems. Chapman and Hall/CRC, London (2017)
Riedel, F.: Optimal stopping with multiple priors. Econometrica 77(3), 857–908 (2009)
Viefers, P.: Should i stay or should i go? A laboratory analysis of investment opportunities under ambiguity. Working Paper (2012)
Weitzman, M.L.: Optimal search for the best alternative. Econ. J. Econ. Soc. 47(3), 641–654 (1979)
Yaoyao, W., Yang, J., Zou, Z.: Ambiguity sharing and the lack of relative performance evaluation. Econ. Theory 66(1), 141–157 (2018)
Zhou, X.Y., Yong, J., Li, X.: Stochastic verification theorems within the framework of viscosity solutions. SIAM J. Control Optim. 35(1), 243–253 (1997)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
I would like to thank Robert M. Anderson, Philipp Strack, Gustavo Manso and Demian Pouzo for the support and guidance over the course of this paper, and I am grateful to Haluk Ergin, Chris Shannon and David Ahn for the valuable comments and suggestions. All remaining errors are mine.
Rights and permissions
About this article
Cite this article
Pourbabaee, F. Robust experimentation in the continuous time bandit problem. Econ Theory 73, 151–181 (2022). https://doi.org/10.1007/s00199-020-01328-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00199-020-01328-3
Keywords
- Model uncertainty
- Dynamic experimentation
- Variational preferences
- Information valuation
- Ambiguous diffusion