当前位置: X-MOL 学术Econ. Theory › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust experimentation in the continuous time bandit problem
Economic Theory ( IF 1.423 ) Pub Date : 2020-11-26 , DOI: 10.1007/s00199-020-01328-3
Farzad Pourbabaee

We study the experimentation dynamics of a decision maker (DM) in a two-armed bandit setup (Bolton and Harris in Econometrica 67(2):349–374, 1999), where the agent holds ambiguous beliefs regarding the distribution of the return process of one arm and is certain about the other one. The DM entertains Multiplier preferences à la Hansen and Sargent (Am. Econ. Rev. 91(2):60–66, 2001), thus we frame the decision making environment as a two-player differential game against nature in continuous time. We characterize the DM’s value function and her optimal experimentation strategy that turns out to follow a cut-off rule with respect to her belief process. The belief threshold for exploring the ambiguous arm is found in closed form and is shown to be increasing with respect to the ambiguity aversion index. We then study the effect of provision of an unambiguous information source about the ambiguous arm. Interestingly, we show that the exploration threshold rises unambiguously as a result of this new information source, thereby leading to more conservatism. This analysis also sheds light on the efficient time to reach for an expert opinion.



中文翻译:

连续时间强盗问题的稳健实验

我们研究了两手匪徒装置中决策者(DM)的实验动态(Bolton和Harris在Econometrica 67(2):349–374,1999年),其中代理对返回过程的分布持有模糊的信念一只手的另一只手是确定的。决策者会接受“乘数”偏好àla Hansen和Sargent(Am。Econ。Rev. 91(2):60-66,2001),因此,我们将决策环境构想为一种针对自然的两人差分游戏在连续的时间内。我们对DM的价值函数和她的最佳实验策略进行了刻画,事实证明,该策略遵循关于她的信念过程的截止规则。探索闭环手臂的信念阈值以封闭形式发现,并且相对于歧义厌恶指数显示出正在增加。然后,我们研究提供有关歧义臂的明确信息源的效果。有趣的是,我们表明,由于这种新的信息来源,勘探门槛明确提高了,从而导致了更多的保守主义。该分析还为寻求专家意见提供了有效的时间。

更新日期:2021-01-12
down
wechat
bug