当前位置: X-MOL 学术Comput. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts
The Computer Journal ( IF 1.5 ) Pub Date : 2020-07-08 , DOI: 10.1093/comjnl/bxaa079
Kai Zhang 1 , Yuan Zhou 2 , Zheng Chen 1 , Yufei Liu 3 , Zhuo Tang 4 , Li Yin 1 , Jihong Chen 1
Affiliation  

The prevalence of short texts on the Web has made mining the latent topic structures of short texts a critical and fundamental task for many applications. However, due to the lack of word co-occurrence information induced by the content sparsity of short texts, it is challenging for traditional topic models like latent Dirichlet allocation (LDA) to extract coherent topic structures on short texts. Incorporating external semantic knowledge into the topic modeling process is an effective strategy to improve the coherence of inferred topics. In this paper, we develop a novel topic model—called biterm correlation knowledge-based topic model (BCK-TM)—to infer latent topics from short texts. Specifically, the proposed model mines biterm correlation knowledge automatically based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate external knowledge, a knowledge incorporation mechanism is designed over the latent topic layer to regularize the topic assignment of each biterm during the topic sampling process. Experimental results on three public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.

中文翻译:

将双向相关知识纳入短文本主题建模

Web上短文本的盛行使挖掘短文本的潜在主题结构成为许多应用程序的关键和基本任务。但是,由于短文本的内容稀疏性导致缺少单词共现信息,因此对于传统主题模型(如潜在狄利克雷分配(LDA))在短文本上提取连贯的主题结构具有挑战性。将外部语义知识纳入主题建模过程是提高推断主题一致性的有效策略。在本文中,我们开发了一种新颖的主题模型-称为基于双项相关知识的主题模型(BCK-TM)-从短文本中推断潜在主题。具体而言,该模型基于词嵌入的最新进展自动挖掘双项相关知识,可以表示连续向量空间中单词的语义信息。为了合并外部知识,在潜在主题层上设计了一种知识合并机制,以在主题采样过程中规范每个双项的主题分配。在三个公共基准数据集上的实验结果表明,与几种最新的基准模型相比,该方法具有更好的性能。
更新日期:2020-07-09
down
wechat
bug