Correlated Bandits for Dynamic Pricing via the ARC algorithm,arXiv - CS - Computational Engineering, Finance, and Science

当前位置： X-MOL 学术 › arXiv.cs.CE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Correlated Bandits for Dynamic Pricing via the ARC algorithm
arXiv - CS - Computational Engineering, Finance, and Science Pub Date : 2021-02-08 , DOI: arxiv-2102.04263
Samuel Cohen, Tanut Treetanthiploet

The Asymptotic Randomised Control (ARC) algorithm provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining reasonable computational complexity. In particular, it allows a decision maker to observe signals in addition to their rewards, to incorporate correlations between the outcomes of different choices, and to have nontrivial dynamics for their estimates. The algorithm is guaranteed to asymptotically optimise the expected discounted payoff, with error depending on the initial uncertainty of the bandit. In this paper, we consider a batched bandit problem where observations arrive from a generalised linear model; we extend the ARC algorithm to this setting. We apply this to a classic dynamic pricing problem based on a Bayesian hierarchical model and demonstrate that the ARC algorithm outperforms alternative approaches.

中文翻译：

通过ARC算法进行动态定价的相关强盗

渐进随机控制（ARC）算法为各种贝叶斯匪徒的最佳策略提供了严格的近似，同时保留了合理的计算复杂性。尤其是，它使决策者不仅可以观察到信号的回报，还可以观察信号，将不同选择的结果之间的相关性纳入考虑之中，并为估算提供非平凡的动力。该算法保证渐近优化预期的折现收益，误差取决于强盗的初始不确定性。在本文中，我们考虑一个批处理的强盗问题，其中观察来自广义线性模型。我们将ARC算法扩展到此设置。

更新日期：2021-02-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文