当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An integrated framework of learning and evidential reasoning for user profiling using short texts
Information Fusion ( IF 18.6 ) Pub Date : 2020-12-24 , DOI: 10.1016/j.inffus.2020.12.004
Duc-Vinh Vo , Jessada Karnjana , Van-Nam Huynh

Inferring user profiles based on texts created by users on social networks has a variety of applications in recommender systems such as job offering, item recommendation, and targeted advertisement. The problem becomes more challenging when working with short texts like tweets on Twitter, or posts on Facebook. This work aims at proposing an integrated framework based on Dempster–Shafer theory of evidence, word embedding, and k-means clustering for user profiling problem, which is capable of not only working well with short texts but also dealing with uncertainty inherently in user texts. The proposed framework is essentially composed of three phases: (1) Learning abstract concepts at multiple levels of abstraction from user corpora; (2) Evidential inference and combination for user modeling; and (3) User profile extraction. Particularly, in the first phase, a word embedding technique is used to convert preprocessed texts into vectors which capture semantics of words in user corpus, and then k-means clustering is utilized for learning abstract concepts at multiple levels of abstraction, each of which reflects appropriate semantics of user profiles. In the second phase, by considering each document in user corpus as an evidential source that carries some partial information for inferring user profiles, we first infer a mass function associated with each user document by maximum a posterior estimation, and then apply Dempster’s rule of combination for fusing all documents’ mass functions into an overall one for the user corpus. Finally, in the third phase, we apply the so-called pignistic probability principle to extract top-n keywords from user’s overall mass function to define the user profile. Thanks to the ability of combining pieces of information from many documents, the proposed framework is flexible enough to be scaled when input data coming from not only multiple modes but different sources on web environments. Besides, the resulting profiles are interpretable, visualizable, and compatible in practical applications. The effectiveness of the proposed framework is validated by experimental studies conducted on datasets crawled from Twitter and Facebook.



中文翻译:

一个集成的学习和证据推理框架,用于使用短文本进行用户概要分析

基于用户在社交网络上创建的文本来推断用户个人资料在推荐系统中具有多种应用,例如工作机会,项目推荐和目标广告。当使用诸如Twitter上的tweet或Facebook上的帖子之类的短文字时,该问题变得更具挑战性。这项工作旨在提出一个基于Dempster-Shafer证据理论,词嵌入和ķ-表示针对用户配置文件问题的聚类,它不仅能够很好地处理短文本,而且能够处理用户文本中固有的不确定性。所提出的框架主要由三个阶段组成:(1)从用户语料库的多个抽象层次学习抽象概念;(2)用于用户建模的证据推理和组合;(3)用户资料提取。特别是在第一阶段,使用词嵌入技术将预处理后的文本转换为矢量,以捕获用户语料库中单词的语义,然后ķ-means聚类用于在多个抽象级别学习抽象概念,每个抽象概念都反映用户配置文件的适当语义。在第二阶段中,通过将用户语料库中的每个文档视为证据源,其中包含一些推断用户资料的部分信息,我们首先通过最大后验推断来推断与每个用户文档相关联的质量函数,然后应用Dempster组合规则用于将所有文档的大量功能融合到用户语料库的整体功能中。最后,在第三阶段,我们应用所谓的“概率论”原理提取最高ñ用户整体质量函数中的关键字来定义用户个人资料。由于能够合并来自多个文档的信息,因此,当输入数据不仅来自多种模式,而且来自网络环境中的不同来源时,所提出的框架也具有足够的伸缩性。此外,在实际应用中生成的轮廓是可解释的,可视化的并且兼容的。通过对从Twitter和Facebook爬取的数据集进行的实验研究,验证了所提出框架的有效性。

更新日期:2020-12-30
down
wechat
bug