Automatic construction of domain-specific sentiment lexicon for unsupervised domain adaptation and sentiment classification,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic construction of domain-specific sentiment lexicon for unsupervised domain adaptation and sentiment classification
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-11-07 , DOI: 10.1016/j.knosys.2020.106423
Omid Mohamad Beigi , Mohammad H. Moattar

Sentiment analysis has long been suffering from inaccuracies using either machine learning methods that mostly benefit from text features or sentiment lexicon-based methods that are prone to domain-dependent problems. Furthermore, since labeling is a time-consuming and an expensive task, supervised machine learning methods suffer from the drawback of insufficient labeled samples. To tackle the mentioned issues, this paper proposes a novel approach with a hybrid of a neural network and a sentiment lexicon. This combination can simultaneously adapt word polarities to the target domain and leverage the polarity of whole document in order to alleviate the need for large labeled corpora in an unsupervised manner. In this respect, a sentiment lexicon is constructed from the source domain in the preprocessing phase using the labeled data. In the Next phase, having a Multilayer Perceptron (MLP), the weights of the first hidden layer are set to the corresponding polarity of each word from the retrieved sentiment lexicon and the network is trained. Finally, a Domain-Independent Lexicon (DIL) is introduced which contains words (mostly adjectives) with static positive or negative scores independent from a specific domain. After feeding the target domain to the pre-trained model, the total accuracy of the framework is enhanced by estimating the sentiment polarity of each sentence using the summation of the scores of the constitutive domain independent words. The experiments on Amazon multi-domain sentiment dataset illustrate that our approach significantly outperforms several alternative previous approaches of unsupervised domain adaptation.

中文翻译：

自动构建特定领域的情感词典，以实现无监督的领域适应和情感分类

长期以来，使用主要受益于文本功能的机器学习方法或基于情感词典的方法（往往容易出现依赖于域的问题），情感分析一直存在不准确的问题。此外，由于标记是耗时且昂贵的任务，所以受监督的机器学习方法具有标记样本不足的缺点。为了解决上述问题，本文提出了一种将神经网络和情感词典相结合的新颖方法。这种组合可以同时使单词极性适应目标域，并利用整个文档的极性，从而以无监督的方式减轻对大型标记语料库的需求。在这方面，在预处理阶段使用标记的数据从源域构造情感词典。在具有多层感知器（MLP）的下一阶段中，将第一隐藏层的权重设置为来自检索到的情感词典的每个单词的对应极性，并训练网络。最后，引入了域无关词典（DIL），其中包含具有与特定域无关的静态正或负分数的单词（主要是形容词）。在将目标域输入到预训练模型后，通过使用与结构域无关的单词的分数总和来估计每个句子的情感极性，可以提高框架的总体准确性。在Amazon多域情感数据集上进行的实验表明，我们的方法明显优于无监督域自适应的几种替代方法。将第一隐藏层的权重设置为来自检索到的情感词典的每个单词的对应极性，并训练网络。最后，引入了域无关词典（DIL），其中包含具有与特定域无关的静态正或负分数的单词（主要是形容词）。在将目标域输入到预训练模型后，通过使用与结构域无关的单词的分数总和来估计每个句子的情感极性，可以提高框架的总体准确性。在Amazon多域情感数据集上进行的实验表明，我们的方法明显优于无监督域自适应的几种替代方法。将第一隐藏层的权重设置为来自检索到的情感词典的每个单词的对应极性，并训练网络。最后，引入了域无关词典（DIL），其中包含具有与特定域无关的静态正或负分数的单词（主要是形容词）。在将目标域输入到预训练模型后，通过使用与结构域无关的单词的分数总和来估计每个句子的情感极性，可以提高框架的总体准确性。在Amazon多域情感数据集上进行的实验表明，我们的方法明显优于无监督域自适应的几种替代方法。

更新日期：2020-11-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11