Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decision,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decision
arXiv - CS - Machine Learning Pub Date : 2020-01-16 , DOI: arxiv-2001.05699
Li Ye, Yishi Lin, Hong Xie, John C.S. Lui

A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong decisions during testing hurt users' experiences and cause irreversible damage. A typical alternative is offline causal inference, which analyzes logged data alone to make decisions. However, these decisions are not adaptive to the new incoming data, and so a wrong decision will continuously hurt users' experiences. To overcome the aforementioned limitations, we propose a framework to unify offline causal inference algorithms (e.g., weighting, matching) and online learning algorithms (e.g., UCB, LinUCB). We propose novel algorithms and derive bounds on the decision accuracy via the notion of "regret". We derive the first upper regret bound for forest-based online bandit algorithms. Experiments on two real datasets show that our algorithms outperform other algorithms that use only logged data or online feedbacks, or algorithms that do not use the data properly.

中文翻译：

结合离线因果推理和在线老虎机学习进行数据驱动决策

对于拥有大量记录数据的公司来说，一个基本问题是：如何将这些记录数据与传入的流数据一起使用来做出正确的决策？许多公司目前通过在线 A/B 测试做出决策，但测试过程中的错误决策会损害用户体验并造成不可逆转的损害。一种典型的替代方法是离线因果推理，它单独分析记录的数据以做出决策。但是，这些决策并不适应新传入的数据，因此错误的决策将不断损害用户体验。为了克服上述限制，我们提出了一个框架来统一离线因果推理算法（例如加权、匹配）和在线学习算法（例如 UCB、LinUCB）。我们提出了新的算法并通过“

更新日期：2020-11-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>