当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Author2Vec: A Framework for Generating User Embedding
arXiv - CS - Computation and Language Pub Date : 2020-03-17 , DOI: arxiv-2003.11627
Xiaodong Wu, Weizhe Lin, Zhilin Wang, and Elena Rastorgueva

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.

中文翻译:

Author2Vec:生成用户嵌入的框架

在线论坛和社交媒体平台每天都提供嘈杂但有价值的数据。在本文中,我们提出了一种新颖的基于端到端神经网络的用户嵌入系统 Author2Vec。该模型将 BERT(来自 Transformers 的双向编码器表示)生成的句子表示与新颖的无监督预训练目标、作者身份分类相结合,以产生更好的用户嵌入,对有用的用户内在属性进行编码。该用户嵌入系统在 10k Reddit 用户的帖子数据上进行了预训练,并在两个用户分类基准上进行了分析和评估:抑郁检测和个性分类,其中证明该模型优于传统的基于计数和基于预测的方法。
更新日期:2020-03-27
down
wechat
bug