MABWISER: Parallelizable Contextual Multi-armed Bandits,International Journal on Artificial Intelligence Tools

当前位置： X-MOL 学术 › Int. J. Artif. Intell. Tools › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MABWISER: Parallelizable Contextual Multi-armed Bandits
International Journal on Artificial Intelligence Tools ( IF 1.1 ) Pub Date : 2021-06-30 , DOI: 10.1142/s0218213021500214
Emily Strong ₁ , Bernard Kleynhans ₁ , Serdar Kadıoğlu ₁

Affiliation

Contextual multi-armed bandit algorithms are an effective approach for online sequential decision-making problems. However, there are limited tools available to support their adoption in the community. To fill this gap, we present an open-source Python library with context-free, parametric and non-parametric contextual multi-armed bandit algorithms. The MABWiser library is designed to be user-friendly and supports custom bandit algorithms for specific applications. Our design provides built-in parallelization to speed up training and testing for scalability with special attention given to ensuring the reproducibility of results. The API makes hybrid strategies possible that combine non-parametric policies with parametric ones, an area that is not explored in the literature. As a practical application, we demonstrate using the library in both batch and online simulations for context-free, parametric and non-parametric contextual policies with the well-known MovieLens data set. Finally, we quantify the performance benefits of built-in parallelization.

中文翻译：

MABWISER：可并行化的上下文多臂强盗

上下文多臂老虎机算法是在线顺序决策问题的有效方法。但是，可用于支持其在社区中采用的工具有限。为了填补这一空白，我们提出了一个开源 Python 库，其中包含无上下文、参数和非参数上下文多臂老虎机算法。MABWiser 库设计为用户友好型，并支持针对特定应用程序的自定义强盗算法。我们的设计提供了内置的并行化，以加快可扩展性的训练和测试，并特别注意确保结果的可重复性。API 使将非参数策略与参数策略相结合的混合策略成为可能，这是文献中未探讨的领域。作为一个实际应用，我们使用著名的 MovieLens 数据集演示了在批处理和在线模拟中使用该库的无上下文、参数和非参数上下文策略。最后，我们量化了内置并行化的性能优势。

更新日期：2021-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>