A note on the advantage of context in Thompson sampling,Journal of Revenue and Pricing Management

当前位置： X-MOL 学术 › Journal of Revenue and Pricing Management › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A note on the advantage of context in Thompson sampling
Journal of Revenue and Pricing Management Pub Date : 2021-03-24 , DOI: 10.1057/s41272-021-00314-1
Michael Byrd , Ross Darrow

Personalization has become a focal point of modern revenue management. However, it is often the case that minimal data are available to appropriately make suggestions tailored to each customer. This has led to many products making use of reinforcement learning-based algorithms to explore sets of offerings to find the best suggestions to improve conversion and revenue. Arguably the most popular of these algorithms are built on the foundation of the multi-arm bandit framework, which has shown great success across a variety of use cases. A general multi-arm bandit algorithm aims to trade-off adaptively exploring available, but under observed, recommendations, with the current known best offering. While much success has been achieved with these relatively understandable procedures, much of the airline industry is losing out on better personalized offers by ignoring the context of the transaction, as is the case in the traditional multi-arm bandit setup. Here, we explore a popular exploration heuristic, Thompson sampling, and note implementation details for multi-arm and contextual bandit variants. While the contextual bandit requires greater computational and technical complexity to include contextual features in the decision process, we illustrate the value it brings by the improvement in overall expected

中文翻译：

关于上下文在汤普森采样中的优势的说明

个性化已成为现代收入管理的重点。但是，通常情况下，最少的数据可用于适当地提出针对每个客户的建议。这导致许多产品利用基于强化学习的算法来探索产品集，以找到最佳建议，以提高转化率和收益。可以说，这些算法中最流行的算法是建立在多臂强盗框架的基础上的，该框架在各种用例中都显示出了巨大的成功。通用的多臂强盗算法旨在以当前已知的最佳产品为代价，自适应地探索可用的，但在观察之下的建议。这些相对容易理解的程序虽然取得了很大的成功，像传统的多臂匪徒设置一样，许多航空业都忽略了交易的上下文，因而失去了更好的个性化报价。在这里，我们探索了一种流行的探索启发法，汤普森采样法，并记录了多臂和上下文强盗变体的实现细节。虽然情境强盗需要更大的计算和技术复杂度才能在决策过程中包含情境特征，但我们举例说明了总体预期的改善所带来的价值

更新日期：2021-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>