Targeted aspects oriented topic modeling for short texts,Applied Intelligence

当前位置： X-MOL 学术 › Appl. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Targeted aspects oriented topic modeling for short texts
Applied Intelligence ( IF 5.3 ) Pub Date : 2020-03-07 , DOI: 10.1007/s10489-020-01672-w
Jin He , Lei Li , Yan Wang , Xindong Wu

Topic modeling has demonstrated its value in short text topic discovery. For this task, a common way adopted by many topic models is to perform a full analysis to find all the possible topics. However, these topic models overlook the importance of deeper topics, leading to confusing topics discovered. In practice, people always tend to find more focused topics on some special aspects (or events), rather than a set of coarse topics. Therefore, in this paper, we propose a novel method, Targeted Aspects Oriented Topic Modeling (TATM), to discover more focused topics on specific aspects in short texts. Specifically, each short text is assigned to only one targeted aspect derived from an enhanced Dirichlet Multinomial Mixture process (E-DMM). This process helps group similar words as many as possible, which achieves topic homogeneity. In addition, TATM discovers the topics for each targeted aspect from as many angles as possible by performing target-level modeling, which achieves topic completeness. Thus, TATM can make a balance between the two conflicting properties without employing any additional information or pre-trained knowledge. The extensive experiments conducted on five real-world datasets demonstrate that our proposed model can effectively discover more focused and complete topics, and it outperforms the state-of-the-art baselines.

中文翻译：

面向目标方面的短文本主题建模

主题建模已证明其在短文本主题发现中的价值。对于此任务，许多主题模型采用的一种通用方法是执行全面分析以找到所有可能的主题。但是，这些主题模型忽略了更深层主题的重要性，从而导致发现的主题混乱。在实践中，人们总是倾向于在某些特殊方面（或事件）上找到更具针对性的主题，而不是一系列粗糙的主题。因此，在本文中，我们提出了一种新颖的方法，即面向目标方面的主题建模（TATM），以在短文本中发现针对特定方面的更多主题。具体来说，每个短文本仅分配给一个来自增强的狄利克雷多项混合处理（E-DMM）的目标方面。此过程有助于将相似的单词尽可能多地分组，从而实现主题的同质性。此外，TATM通过执行目标级别的建模，从尽可能多的角度发现每个目标方面的主题，从而实现主题的完整性。因此，TATM可以在两个冲突的属性之间取得平衡，而无需使用任何其他信息或预先训练的知识。在五个真实世界的数据集上进行的广泛实验表明，我们提出的模型可以有效地发现更集中，更完整的主题，并且优于最新的基准。

更新日期：2020-03-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>