当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi-supervised Collaborative Filtering by Text-enhanced Domain Adaptation
arXiv - CS - Information Retrieval Pub Date : 2020-06-28 , DOI: arxiv-2007.07085
Wenhui Yu and Xiao Lin and Junfeng Ge and Wenwu Ou and Zheng Qin

Data sparsity is an inherent challenge in the recommender systems, where most of the data is collected from the implicit feedbacks of users. This causes two difficulties in designing effective algorithms: first, the majority of users only have a few interactions with the system and there is no enough data for learning; second, there are no negative samples in the implicit feedbacks and it is a common practice to perform negative sampling to generate negative samples. However, this leads to a consequence that many potential positive samples are mislabeled as negative ones and data sparsity would exacerbate the mislabeling problem. To solve these difficulties, we regard the problem of recommendation on sparse implicit feedbacks as a semi-supervised learning task, and explore domain adaption to solve it. We transfer the knowledge learned from dense data to sparse data and we focus on the most challenging case -- there is no user or item overlap. In this extreme case, aligning embeddings of two datasets directly is rather sub-optimal since the two latent spaces encode very different information. As such, we adopt domain-invariant textual features as the anchor points to align the latent spaces. To align the embeddings, we extract the textual features for each user and item and feed them into a domain classifier with the embeddings of users and items. The embeddings are trained to puzzle the classifier and textual features are fixed as anchor points. By domain adaptation, the distribution pattern in the source domain is transferred to the target domain. As the target part can be supervised by domain adaptation, we abandon negative sampling in target dataset to avoid label noise. We adopt three pairs of real-world datasets to validate the effectiveness of our transfer strategy. Results show that our models outperform existing models significantly.

中文翻译:

基于文本增强域自适应的半监督协同过滤

数据稀疏性是推荐系统的固有挑战,其中大部分数据是从用户的隐式反馈中收集的。这给设计有效算法带来了两个困难:第一,大多数用户与系统的交互很少,没有足够的数据可供学习;其次,隐式反馈中没有负样本,执行负采样生成负样本是一种常见的做法。然而,这会导致许多潜在的正样本被错误标记为负样本,数据稀疏会加剧错误标记问题。为了解决这些困难,我们将稀疏隐式反馈的推荐问题视为半监督学习任务,并探索领域自适应来解决它。我们将从密集数据中学到的知识转移到稀疏数据中,我们专注于最具挑战性的情况——没有用户或项目重叠。在这种极端情况下,直接对齐两个数据集的嵌入是不太理想的,因为两个潜在空间编码的信息非常不同。因此,我们采用域不变的文本特征作为锚点来对齐潜在空间。为了对齐嵌入,我们提取每个用户和项目的文本特征,并将它们输入到具有用户和项目嵌入的域分类器中。嵌入被训练来迷惑分类器,文本特征被固定为锚点。通过域自适应,将源域中的分布模式转移到目标域。由于目标部分可以通过域自适应进行监督,我们放弃目标数据集中的负采样以避免标签噪声。我们采用三对真实世界的数据集来验证我们的转移策略的有效性。结果表明,我们的模型明显优于现有模型。
更新日期:2020-07-15
down
wechat
bug