当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A reliable cross-site user generated content modeling method based on topic model
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-09-21 , DOI: 10.1016/j.knosys.2020.106435
Baoxi Liu , Peng Zhang , Tun Lu , Ning Gu

Nowadays, social network sites (SNSs) have been significant platforms for content sharing in our daily life. With the emergence of different kinds of social network sites and users’ diverse needs for content sharing, their content sharing practices are generally taken place in multiple SNSs. To construct models that can characterize users’ content sharing practices in a composite context constituted by multiple social network sites (cross-site user generated content modeling) has been an emerging research topic in web data mining and human behavior research. However, previous methods such as Dirichlet Multinomial Mixture model (DMM), Biterm Topic Model (BTM), Twitter-LDA and MultiLDA have limited representation ability or are based on unreliable assumption, which cannot characterize the user generated content (UGC) accurately from the perspective of multiple SNSs. In this paper, we first conduct an empirical study to investigate the characteristics of users’ content sharing practices in cross-site context, based on which we propose a more reliable cross-site UGC model named CrossSite-LDA (C-LDA). We then evaluate the performances of the C-LDA model with four state-of-the-art models based on the two data sets sampled from Weibo–Douban and Facebook–Twitter. Results show that the C-LDA has better performances in perplexity, word coherence, topic KL divergence, UCI and UMass metrics compared with existing models, which suggests its superior accuracy on modeling users’ content characteristics in cross-site context.



中文翻译:

基于主题模型的可靠的跨站点用户生成内容建模方法

如今,社交网站(SNS)已经成为我们日常生活中共享内容的重要平台。随着不同种类的社交网站的出现以及用户对内容共享的不同需求,其内容共享实践通常在多个SNS中进行。在由多个社交网站组成的复合上下文中构造能够表征用户内容共享实践的模型(跨站点用户生成的内容建模)已经成为Web数据挖掘和人类行为研究中的新兴研究主题。但是,先前的方法(例如Dirichlet多项式混合模型(DMM),双项主题模型(BTM),Twitter-LDA和MultiLDA)具有有限的表示能力或基于不可靠的假设,从多个SNS的角度来看,这无法准确地表征用户生成的内容(UGC)。在本文中,我们首先进行实证研究,以研究跨站点上下文中用户内容共享实践的特征,在此基础上,我们提出了一个更可靠的跨站点UGC模型CrossSite-LDA(C-LDA)。然后,我们根据从微博(Douban)和Facebook-Twitter抽取的两个数据集,使用四个最新模型评估C-LDA模型的性能。结果表明,与现有模型相比,C-LDA在困惑度,单词连贯性,主题KL散度,UCI和UMass指标方面具有更好的性能,这表明它在跨站点上下文中对用户内容特征进行建模时具有较高的准确性。我们首先进行了一项实证研究,以研究跨站点上下文中用户内容共享实践的特征,在此基础上,我们提出了一个更可靠的跨站点UGC模型CrossSite-LDA(C-LDA)。然后,我们基于从Weibo(Douban)和Facebook-Twitter收集的两个数据集,使用四个最新模型评估C-LDA模型的性能。结果表明,与现有模型相比,C-LDA在困惑度,单词连贯性,主题KL散度,UCI和UMass指标方面具有更好的性能,这表明它在跨站点上下文中对用户内容特征进行建模时具有较高的准确性。我们首先进行了一项实证研究,以研究跨站点上下文中用户内容共享实践的特征,在此基础上,我们提出了一个更可靠的跨站点UGC模型CrossSite-LDA(C-LDA)。然后,我们根据从微博(Douban)和Facebook-Twitter抽取的两个数据集,使用四个最新模型评估C-LDA模型的性能。结果表明,与现有模型相比,C-LDA在困惑度,单词连贯性,主题KL散度,UCI和UMass指标方面具有更好的性能,这表明它在跨站点上下文中对用户内容特征进行建模时具有较高的准确性。然后,我们根据从微博(Douban)和Facebook-Twitter抽取的两个数据集,使用四个最新模型评估C-LDA模型的性能。结果表明,与现有模型相比,C-LDA在困惑度,单词连贯性,主题KL散度,UCI和UMass指标方面具有更好的性能,这表明它在跨站点上下文中对用户内容特征进行建模时具有较高的准确性。然后,我们根据从微博(Douban)和Facebook-Twitter抽取的两个数据集,使用四个最新模型评估C-LDA模型的性能。结果表明,与现有模型相比,C-LDA在困惑度,单词连贯性,主题KL散度,UCI和UMass指标方面具有更好的性能,这表明它在跨站点上下文中对用户内容特征进行建模时具有较高的准确性。

更新日期:2020-09-23
down
wechat
bug