当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-domain sentiment aware word embeddings for review sentiment analysis
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-08-11 , DOI: 10.1007/s13042-020-01175-7
Jun Liu , Shuang Zheng , Guangxia Xu , Mingwei Lin

Learning low-dimensional vector representations of words from a large corpus is one of the basic tasks in natural language processing (NLP). The existing universal word embedding model learns word vectors mainly through grammar and semantic information from the context, while ignoring the sentiment information contained in the words. Some approaches, although they model sentiment information in the reviews, do not consider certain words in different domains. In a case where the emotion changes, if the general word vector is directly applied to the review sentiment analysis task, then this will inevitably affect the performance of the sentiment classification. To solve this problem, this paper extends the CBoW (continuous bag-of-words) word vector model and proposes a cross-domain sentiment aware word embedding learning model, which can capture the sentiment information and domain relevance of a word at the same time. This paper conducts several experiments on Amazon user review data in different domains to evaluate the performance of the model. The experimental results show that the proposed model can obtain a nearly 2% accuracy improvement compared with the general word vector when modeling only the sentiment information of the context. At the same time, when the domain information and the sentiment information are both included, the accuracy and Macro-F1 value of the sentiment classification tasks are significantly improved compared with existing sentiment word embeddings.



中文翻译:

跨领域情感感知词嵌入,用于评论情感分析

从大型语料库学习单词的低维向量表示是自然语言处理(NLP)的基本任务之一。现有的通用词嵌入模型主要通过从上下文中通过语法和语义信息来学习词向量,而忽略了词中包含的情感信息。有些方法虽然在评论中为情感信息建模,但并未考虑不同领域中的某些单词。在情绪发生变化的情况下,如果将通用词向量直接应用于复查情感分析任务,则不可避免地会影响情感分类的性能。为了解决这个问题,本文扩展了CBoW(连续词袋)词向量模型,并提出了一种跨域情感感知词嵌入学习模型,可以同时捕获单词的情感信息和域相关性。本文对不同域中的Amazon用户审阅数据进行了一些实验,以评估模型的性能。实验结果表明,仅对上下文的情感信息进行建模时,与普通的词向量相比,所提模型的准确率提高了近2%。同时,当同时包含域信息和情感信息时,与现有的情感词嵌入相比,情感分类任务的准确性和Macro-F1值得到了显着提高。本文对不同领域的Amazon用户审阅数据进行了一些实验,以评估模型的性能。实验结果表明,仅对上下文的情感信息进行建模时,与普通的词向量相比,所提模型的准确率提高了近2%。同时,当同时包含域信息和情感信息时,与现有的情感词嵌入相比,情感分类任务的准确性和Macro-F1值得到了显着提高。本文对不同域中的Amazon用户审阅数据进行了一些实验,以评估模型的性能。实验结果表明,仅对上下文的情感信息进行建模时,与普通的词向量相比,所提模型的准确率提高了近2%。同时,当同时包含域信息和情感信息时,与现有的情感词嵌入相比,情感分类任务的准确性和Macro-F1值得到了显着提高。

更新日期:2020-08-11
down
wechat
bug