当前位置: X-MOL 学术Appl. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Aggregated topic models for increasing social media topic coherence
Applied Intelligence ( IF 3.4 ) Pub Date : 2019-07-10 , DOI: 10.1007/s10489-019-01438-z
Stuart J. Blair , Yaxin Bi , Maurice D. Mulvenna

Abstract

This research presents a novel aggregating method for constructing an aggregated topic model that is composed of the topics with greater coherence than individual models. When generating a topic model, a number of parameters have to be specified. The resulting topics can be very general or very specific, which depend on the chosen parameters. In this study we investigate the process of aggregating multiple topic models generated using different parameters with a focus on whether combining the general and specific topics is able to increase topic coherence. We employ cosine similarity and Jensen-Shannon divergence to compute the similarity among topics and combine them into an aggregated model when their similarity scores exceed a predefined threshold. The model is evaluated against the standard topics models generated by the latent Dirichlet allocation and Non-negative Matrix Factorisation. Specifically we use the coherence of topics to compare the individual models that create aggregated models against those of the aggregated model and models generated by Non-negative Matrix Factorisation, respectively. The results demonstrate that the aggregated model outperforms those topic models at a statistically significant level in terms of topic coherence over an external corpus. We also make use of the aggregated topic model on social media data to validate the method in a realistic scenario and find that again it outperforms individual topic models.



中文翻译:

聚合主题模型以提高社交媒体主题的连贯性

摘要

这项研究提出了一种新颖的聚合方法,用于构建聚合主题模型,该模型由比单个模型具有更大连贯性的主题组成。生成主题模型时,必须指定许多参数。结果主题可能非常笼统或非常具体,具体取决于所选参数。在这项研究中,我们研究了汇总使用不同参数生成的多个主题模型的过程,重点是结合一般主题和特定主题是否能够提高主题一致性。我们使用余弦相似度和Jensen-Shannon散度来计算主题之间的相似度,并在它们的相似度分数超过预定义的阈值时将它们组合成一个汇总模型。根据潜在的Dirichlet分配和非负矩阵分解生成的标准主题模型对模型进行评估。具体来说,我们使用主题的一致性将创建聚合模型的各个模型与聚合模型和非负矩阵分解生成的模型分别进行比较。结果表明,就外部语料库上的主题一致性而言,聚合模型在统计学上显着水平上优于那些主题模型。我们还利用社交媒体数据上的汇总主题模型在现实情况下验证该方法,并发现它再次优于单个主题模型。具体来说,我们使用主题的一致性将创建聚合模型的各个模型与聚合模型和非负矩阵分解生成的模型分别进行比较。结果表明,就外部语料库的主题一致性而言,聚合模型在统计学上显着地优于那些主题模型。我们还利用社交媒体数据上的汇总主题模型在现实情况下验证该方法,并发现它再次优于单个主题模型。具体来说,我们使用主题的一致性将创建聚合模型的各个模型与聚合模型和非负矩阵分解生成的模型分别进行比较。结果表明,就外部语料库上的主题一致性而言,聚合模型在统计学上显着水平上优于那些主题模型。我们还利用社交媒体数据上的汇总主题模型在现实情况下验证该方法,并发现它再次优于单个主题模型。

更新日期:2020-01-04
down
wechat
bug