Learning Semantic Representations from Directed Social Links to Tag Microblog Users at Scale,ACM Transactions on Information Systems

当前位置： X-MOL 学术 › ACM Trans. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Semantic Representations from Directed Social Links to Tag Microblog Users at Scale
ACM Transactions on Information Systems ( IF 5.6 ) Pub Date : 2020-03-07 , DOI: 10.1145/3377550
Wayne Xin Zhao ₁ , Yupeng Hou ₁ , Junhua Chen ₁ , Jonathan J. H. Zhu ₂ , Eddy Jing Yin ₃ , Hanting Su ₄ , Ji-Rong Wen ₄

Affiliation

This article presents a network embedding approach to automatically generate tags for microblog users. Instead of using text data, we aim to annotate microblog users with meaningful tags by leveraging rich social link data. To utilize directed social links, we use two kinds of node representations for modeling user interest in terms of their followers and followees, respectively. To alleviate the sparsity problem, we propose a novel method based on two transformation functions for capturing implicit interest similarity. Different from previous works on capturing high-order proximity, our model is able to directly characterize the effect of the context user on the proximity of node pairs. Another novelty of our model is that the importance scores of users learned from the classic PageRank algorithm are utilized to set the link weights. By using such weights, our model is more capable of disentangling the interest similarity evidence of a link. We jointly consider the above factors when designing the final objective function. We construct a very large evaluation set consisting of 2.6M users, 0.5M tags, and 0.8B following links. To our knowledge, it is the largest reported dataset for microblog user tagging in the literature. Extensive experiments on this dataset demonstrate the effectiveness of the proposed approach. We implement this approach with several optimization techniques, which makes our model easy to scale to very large social networks. Ubiquitous social links provide important data resources to understand user interests. Our work provides an effective and efficient solution to annotate user interests solely using the link data, which has important practical value in industry. To illustrate the use of our models, we implement a demonstration system for visualizing, navigating, and searching microblog users.

中文翻译：

从定向社交链接学习语义表示以大规模标记微博用户

本文提出了一种网络嵌入方法，为微博用户自动生成标签。我们的目标不是使用文本数据，而是利用丰富的社交链接数据为微博用户添加有意义的标签。为了利用有向社交链接，我们使用两种节点表示来分别根据关注者和关注者来建模用户兴趣。为了缓解稀疏性问题，我们提出了一种基于两个变换函数来捕获隐式兴趣相似度的新方法。与之前捕获高阶接近度的工作不同，我们的模型能够直接表征上下文用户对节点对接近度的影响。我们模型的另一个新颖之处在于，利用从经典 PageRank 算法中学习到的用户的重要性分数来设置链接权重。通过使用这样的权重，我们的模型更有能力解开链接的兴趣相似性证据。在设计最终目标函数时，我们共同考虑了上述因素。我们构建了一个非常大的评估集，包括 2.6M 用户、0.5M 标签和 0.8B 链接。据我们所知，它是文献中报道的最大的微博用户标记数据集。对该数据集的广泛实验证明了所提出方法的有效性。我们使用多种优化技术实现了这种方法，这使得我们的模型很容易扩展到非常大的社交网络。无处不在的社交链接为了解用户兴趣提供了重要的数据资源。我们的工作提供了一种有效且高效的解决方案，仅使用链接数据来注释用户兴趣，具有重要的工业实用价值。为了说明我们模型的使用，我们实现了一个用于可视化、导航和搜索微博用户的演示系统。

更新日期：2020-03-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>