当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm
Journal of Intelligent Information Systems ( IF 3.4 ) Pub Date : 2020-05-25 , DOI: 10.1007/s10844-020-00597-7
Di Wu , Ruixin Yang , Chao Shen

The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is proposed. A definition of a word bag based on sentiment word co-occurrence is proposed. The co-occurrence of emotional words takes full account of different short texts. Then, the short texts of a microblog are endowed with emotional polarity. Furthermore, the knowledge pairs of topic special words and topic relation words are extracted and inserted into the LDA model for clustering. Thus, semantic information can be found more accurately. Then, the hidden n topics and Top30 special words set of each topic are extracted from the knowledge pair set. Finally, via LDA topic model primary clustering, a Top30 topic special words set is obtained that is clustered by K-means secondary clustering. The clustering center is optimized iteratively. Comparing with JST, LSM, LTM and ELDA, SKP-LDA performs better in terms of Accuracy, Precision, Recall and F-measure. The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect. It can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.

中文翻译:

基于情感词共现和知识对特征提取的LDA短文本聚类算法

潜在狄利克雷分配(LDA)主题模型是文本挖掘领域的热门研究课题。本文提出了基于情感词共现和知识对特征提取的LDA短文本聚类算法(SKP-LDA)。提出了一种基于情感词共现的词袋定义。情感词的共现充分考虑了不同的短文本。于是,一条微博的短文被赋予了情感极性。进一步提取主题特殊词和主题关系词的知识对,插入LDA模型进行聚类。因此,可以更准确地找到语义信息。然后,从知识对集中提取隐藏的n个主题和每个主题的Top30特殊词集。最后,通过 LDA 主题模型主聚类,通过K-means二次聚类得到Top30主题特殊词集。聚类中心被迭代优化。与 JST、LSM、LTM 和 ELDA 相比,SKP-LDA 在 Accuracy、Precision、Recall 和 F-measure 方面表现更好。实验结果表明,SKP-LDA具有更好的语义分析能力和情感主题聚类效果。可应用于微博领域,有效提高网络舆情分析的准确性。实验结果表明,SKP-LDA具有更好的语义分析能力和情感主题聚类效果。可应用于微博领域,有效提高网络舆情分析的准确性。实验结果表明,SKP-LDA具有更好的语义分析能力和情感主题聚类效果。可应用于微博领域,有效提高网络舆情分析的准确性。
更新日期:2020-05-25
down
wechat
bug