Author2Vec: A Framework for Generating User Embedding,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Author2Vec: A Framework for Generating User Embedding
arXiv - CS - Computation and Language Pub Date : 2020-03-17 , DOI: arxiv-2003.11627
Xiaodong Wu, Weizhe Lin, Zhilin Wang, and Elena Rastorgueva

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.

中文翻译：

Author2Vec：生成用户嵌入的框架

在线论坛和社交媒体平台每天都提供嘈杂但有价值的数据。在本文中，我们提出了一种新颖的基于端到端神经网络的用户嵌入系统 Author2Vec。该模型将 BERT（来自 Transformers 的双向编码器表示）生成的句子表示与新颖的无监督预训练目标、作者身份分类相结合，以产生更好的用户嵌入，对有用的用户内在属性进行编码。该用户嵌入系统在 10k Reddit 用户的帖子数据上进行了预训练，并在两个用户分类基准上进行了分析和评估：抑郁检测和个性分类，其中证明该模型优于传统的基于计数和基于预测的方法。

更新日期：2020-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文