当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bounding System-Induced Biases in Recommender Systems with A Randomized Dataset
arXiv - CS - Information Retrieval Pub Date : 2023-03-21 , DOI: arxiv-2303.11574
Dugang Liu, Pengxiang Cheng, Zinan Lin, Xiaolian Zhang, Zhenhua Dong, Rui Zhang, Xiuqiang He, Weike Pan, Zhong Ming

Debiased recommendation with a randomized dataset has shown very promising results in mitigating the system-induced biases. However, it still lacks more theoretical insights or an ideal optimization objective function compared with the other more well studied route without a randomized dataset. To bridge this gap, we study the debiasing problem from a new perspective and propose to directly minimize the upper bound of an ideal objective function, which facilitates a better potential solution to the system-induced biases. Firstly, we formulate a new ideal optimization objective function with a randomized dataset. Secondly, according to the prior constraints that an adopted loss function may satisfy, we derive two different upper bounds of the objective function, i.e., a generalization error bound with the triangle inequality and a generalization error bound with the separability. Thirdly, we show that most existing related methods can be regarded as the insufficient optimization of these two upper bounds. Fourthly, we propose a novel method called debiasing approximate upper bound with a randomized dataset (DUB), which achieves a more sufficient optimization of these upper bounds. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our DUB.

中文翻译:

使用随机数据集在推荐系统中限制系统引起的偏差

使用随机数据集的去偏推荐在减轻系统引起的偏差方面显示出非常有希望的结果。然而,与没有随机数据集的其他更深入研究的路线相比,它仍然缺乏更多的理论见解或理想的优化目标函数。为了弥合这一差距,我们从一个新的角度研究了去偏问题,并提出直​​接最小化理想目标函数的上限,这有助于更好地解决系统引起的偏差。首先,我们用随机数据集制定了一个新的理想优化目标函数。其次,根据采用的损失函数可能满足的先验约束,我们推导出目标函数的两个不同上界,即 与三角不等式相关的泛化误差和与可分离性相关的泛化误差。第三,我们表明大多数现有的相关方法都可以被视为对这两个上限的优化不足。第四,我们提出了一种使用随机数据集 (DUB) 去除近似上限的新方法,该方法实现了对这些上限的更充分优化。最后,我们在公共数据集和真实产品数据集上进行了大量实验,以验证我们 DUB 的有效性。这实现了对这些上限的更充分的优化。最后,我们在公共数据集和真实产品数据集上进行了大量实验,以验证我们 DUB 的有效性。这实现了对这些上限的更充分的优化。最后,我们在公共数据集和真实产品数据集上进行了大量实验,以验证我们 DUB 的有效性。
更新日期:2023-03-22
down
wechat
bug