当前位置: X-MOL 学术Intell. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Domain sentiment dictionary construction and optimization based on multi-source information fusion
Intelligent Data Analysis ( IF 0.9 ) Pub Date : 2020-03-27 , DOI: 10.3233/ida-184426
Zuo Chen 1, 2 , Xin Li 1 , Min Wang 1, 3 , Shenggang Yang 3
Affiliation  

Sentiment analysis of text data, such as reviews, can help users and merchants make more favorable decisions. It is difficult to use the popular supervised learning method to complete the sentiment classification task because marking data manually is time-consuming and laborious. Unsupervised sentiment classification methods are mostly based on sentiment lexicons. The existing sentiment lexicons are simply not capable of domain sentiment classification, it still requires to construct a domain sentiment lexicon. There are still many problems with the advanced domain sentiment lexicon construction methods, e.g., rely heavily on labeled data, poor accuracy. We propose a labeled data extension idea to reduce the dependence of supervised learning methods on labeled data. In order to solve the problems of domain sentiment lexicon construction, we proposed a novel framework based on multi-source information fusion (MSIF) for learning. We extracted four kinds of emotional information, which are lexicon emotional information, emotional word co-occurrence information, emotional word polarity information and polarity relationship information of emotional word pair. When extracting the co-occurrence information, a novel method based on the data extension idea is proposed to enhance its accuracy and coverage. In order to accelerate the solution of the fusion model, an optimization method based on the ADMM algorithm is applied. Experimental results on five Amazon product review datasets show that the sentiment dictionary constructed by the proposed method can significantly improve the performance of review sentiment classification compared with the current popular baseline and the state-of-the-art methods.

中文翻译:

基于多源信息融合的领域情感词典的构建与优化

文本数据的情感分析(例如评论)可以帮助用户和商家做出更有利的决策。使用人工监督的学习方法很难完成情绪分类任务,因为手动标记数据既费时又费力。无监督的情感分类方法主要基于情感词典。现有的情感词典根本无法进行领域情感分类,但仍然需要构建领域情感词典。先进的领域情感词典构建方法仍然存在许多问题,例如,严重依赖标记的数据,准确性较差。我们提出了标记数据扩展思想,以减少监督学习方法对标记数据的依赖性。为了解决领域情感词典建设的问题,我们提出了一种基于多源信息融合(MSIF)的新型学习框架。我们提取了四种情感​​信息,分别是词典情感信息,情感词共现信息,情感词极性信息和情感词对的极性关系信息。在提取共现信息时,提出了一种基于数据扩展思想的新方法,以提高其准确性和覆盖范围。为了加快融合模型的求解速度,采用了基于ADMM算法的优化方法。
更新日期:2020-03-27
down
wechat
bug