Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems,ACM Transactions on Knowledge Discovery from Data

当前位置： X-MOL 学术 › ACM Trans. Knowl. Discov. Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2021-07-21 , DOI: 10.1145/3461340
Zhao Li ₁ , Junshuai Song ₂ , Zehong Hu ₁ , Zhen Wang ₁ , Jun Gao ₂

Affiliation

Impression regulation plays an important role in various online ranking systems, e.g. , e-commerce ranking systems always need to achieve local commercial demands on some pre-labeled target items like fresh item cultivation and fraudulent item counteracting while maximizing its global revenue. However, local impression regulation may cause “butterfly effects” on the global scale, e.g. , in e-commerce, the price preference fluctuation in initial conditions (overpriced or underpriced items) may create a significantly different outcome, thus affecting shopping experience and bringing economic losses to platforms. To prevent “butterfly effects”, some researchers define their regulation objectives with global constraints, by using contextual bandit at the page-level that requires all items on one page sharing the same regulation action, which fails to conduct impression regulation on individual items. To address this problem, in this article, we propose a personalized impression regulation method that can directly makes regulation decisions for each user-item pair. Specifically, we model the regulation problem as a C onstrained D ual-level B andit (CDB) problem, where the local regulation action and reward signals are at the item-level while the global effect constraint on the platform impression can be calculated at the page-level only. To handle the asynchronous signals, we first expand the page-level constraint to the item-level and then derive the policy updating as a second-order cone optimization problem. Our CDB approaches the optimal policy by iteratively solving the optimization problem. Experiments are performed on both offline and online datasets, and the results, theoretically and empirically, demonstrate CDB outperforms state-of-the-art algorithms.

中文翻译：

在线排名系统中用于个性化印象规制的约束双级强盗

印象调节在各种在线排名系统中发挥着重要作用，例如，电子商务排名系统总是需要在最大化其全球收入的同时，实现对一些预先标记的目标商品的本地商业需求，如新鲜商品种植和欺诈商品抵制。但是，地方印象规制可能会在全球范围内造成“蝴蝶效应”，例如在电子商务中，初始条件下的价格偏好波动（定价过高或过低的商品）可能会产生明显不同的结果，从而影响购物体验并给平台带来经济损失。为了防止“蝴蝶效应”，一些研究人员通过使用页面级别的上下文强盗来定义他们的监管目标，即要求一个页面上的所有项目共享相同的监管动作，这无法对单个项目进行印象监管。为了解决这个问题，在本文中，我们提出了一种个性化的印象调节方法，可以直接为每个用户-项目对做出调节决策。具体来说，我们将监管问题建模为C紧张D双级乙andit (CDB) 问题，其中局部监管行动和奖励信号在项目级别，而对平台印象的全局影响约束只能在页面级别计算。为了处理异步信号，我们首先将页面级约束扩展到项目级，然后将策略更新推导出为二阶锥优化问题。我们的 CDB 通过迭代解决优化问题来接近最优策略。在离线和在线数据集上都进行了实验，结果在理论上和经验上都表明 CDB 优于最先进的算法。

更新日期：2021-07-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11