Big Data Aspect-Based Opinion Mining Using the SLDA and HME-LDA Models,Wireless Communications and Mobile Computing

当前位置： X-MOL 学术 › Wirel. Commun. Mob. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Big Data Aspect-Based Opinion Mining Using the SLDA and HME-LDA Models
Wireless Communications and Mobile Computing ( IF 2.146 ) Pub Date : 2020-11-19 , DOI: 10.1155/2020/8869385
Ling Yuan ₁ , JiaLi Bin ₁ , YinZhen Wei ₂ , Fei Huang ₃ , XiaoFei Hu ₃ , Min Tan ₃

Affiliation

In order to make better use of massive network comment data for decision-making support of customers and merchants in the big data era, this paper proposes two unsupervised optimized LDA (Latent Dirichlet Allocation) models, namely, SLDA (SentiWordNet WordNet-Latent Dirichlet Allocation) and HME-LDA (Hierarchical Clustering MaxEnt-Latent Dirichlet Allocation), for aspect-based opinion mining. One scheme of each of two optimized models, which both use seed words as topic words and construct the inverted index, is designed to enhance the readability of experiment results. Meanwhile, based on the LDA topic model, we introduce new indicator variables to refine the classification of topics and try to classify the opinion target words and the sentiment opinion words by two different schemes. For better classification effect, the similarity between words and seed words is calculated in two ways to offset the fixed parameters in the standard LDA. In addition, based on the SemEval2016ABSA data set and the Yelp data set, we design comparative experiments with training sets of different sizes and different seed words, which prove that the SLDA and the HME-LDA have better performance on the accuracy, recall value, and harmonic value with unannotated training sets.

中文翻译：

使用SLDA和HME-LDA模型的基于大数据方面的意见挖掘

为了更好地利用海量网络评论数据为大数据时代的客户和商家提供决策支持，本文提出了两种无监督的优化LDA（潜在狄利克雷分配）模型，即SLDA（SentiWordNet WordNet-Latent Dirichlet Allocation））和HME-LDA（分层聚类MaxEnt-潜在Dirichlet分配），用于基于方面的意见挖掘。设计了两种优化模型中的每一种的一种方案，它们都使用种子词作为主题词并构造倒排索引，以提高实验结果的可读性。同时，在LDA主题模型的基础上，引入了新的指标变量来完善主题的分类，并尝试通过两种不同的方案对观点目标词和情感观点词进行分类。为了获得更好的分类效果，单词和种子单词之间的相似度是通过两种方式计算的，以抵消标准LDA中的固定参数。此外，基于SemEval2016ABSA数据集和Yelp数据集，我们设计了具有不同大小和不同种子词的训练集的对比实验，证明SLDA和HME-LDA在准确性，召回价值，和带有未注释训练集的谐波值。

更新日期：2020-11-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>