当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deriving the sentiment polarity of term senses using dual-step context-aware in-gloss matching
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-06-05 , DOI: 10.1016/j.ipm.2020.102273
Mohammad Darwich , Shahrul Azman Mohd Noah , Nazlia Omar

Vital to the task of Sentiment Analysis (SA), or automatically mining sentiment expression from text, is a sentiment lexicon. This fundamental lexical resource comprises the smallest sentiment-carrying units of text, words, annotated for their sentiment properties, and aids in SA tasks on larger pieces of text. Unfortunately, digital dictionaries do not readily include information on the sentiment properties of their entries, and manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large number of research works concentrated on automated sentiment lexicon generation. The dictionary-based approach involves leveraging digital dictionaries, while the corpus-based approach involves exploiting co-occurrence statistics embedded in text corpora. Although the former approach has been exhaustively investigated, the majority of works focus on terms. The few state-of-the-art models concentrated on the finer-grained term sense level remain to exhibit several prominent limitations, e.g., the proposed semantic relations algorithm retrieves only senses that are at a close proximity to the seed senses in the semantic network, thus prohibiting the retrieval of remote sentiment-carrying senses beyond the reach of the ‘radius’ defined by number of iterations of semantic relations expansion. The proposed model aims to overcome the issues inherent in dictionary-based sense-level sentiment lexicon generation models using: (1) null seed sets, and a morphological approach inspired by the Marking Theory in Linguistics to populate them automatically; (2) a dual-step context-aware gloss expansion algorithm that ‘mines’ human defined gloss information from a digital dictionary, ensuring senses overlooked by the semantic relations expansion algorithm are identified; and (3) a fully-unsupervised sentiment categorization algorithm on the basis of the Network Theory. The results demonstrate that context-aware in-gloss matching successfully retrieves senses beyond the reach of the semantic relations expansion algorithm used by prominent, well-known models. Evaluation of the proposed model to accurately assign senses with polarity demonstrates that it is on par with state-of-the-art models against the same gold standard benchmarks. The model has theoretical implications in future work to effectively exploit the readily-available human-defined gloss information in a digital dictionary, in the task of assigning polarity to term senses. Extrinsic evaluation in a real-world sentiment classification task on multiple publically-available varying-domain datasets demonstrates its practical implication and application in sentiment analysis, as well as in other related fields such as information science, opinion retrieval and computational linguistics.



中文翻译:

使用双步上下文感知的In-Gloss匹配来推导术语感官的情感极性

情感词典是情感分析(SA)任务(或从文本中自动提取情感表达)的关键。这种基本的词汇资源包括文本,单词的最小情感承载单元,以其情感属性进行注释,并有助于在较大的文本上执行SA任务。不幸的是,数字词典不容易包含有关其条目的情感属性的信息,并且就注释者的时间和精力而言,手动编译情感词典是乏味的。这导致出现了大量集中于自动情感词典生成的研究工作。基于字典的方法涉及利用数字词典,而基于语料库的方法涉及利用文本语料库中嵌入的共现统计。尽管已经对前一种方法进行了详尽的研究,但大多数作品都集中在术语上。少数集中在细粒度术语感知水平上的最新模型仍然表现出一些突出的局限性,例如,所提出的语义关系算法仅检索与语义网络中的种子意义非常接近的意义,因此禁止检索超出语义关系扩展迭代次数所定义的“半径”范围之外的远程情感承载意义。所提出的模型旨在克服使用以下方法的基于字典的感官水平情感词典生成模型所固有的问题:(1)空种子集,以及一种由语言学中的标记理论启发的形态学方法来自动填充它们;(2)双步上下文感知光泽度扩展算法,该算法从数字词典中“挖掘”人类定义的光泽度信息,确保识别出语义关系扩展算法所忽略的感觉;(3)基于网络理论的完全无监督的情感分类算法。结果表明,上下文感知的光泽内匹配成功地检索了知名的著名模型所使用的语义关系扩展算法无法达到的感觉。对建议的模型进行评估以准确地分配极性感,这表明它与最新的模型在相同的金标准基准上是一致的。该模型在未来工作中具有理论意义,可以有效地利用数字词典中易于获得的人为定义的光泽度信息,从而为术语感官分配极性。

更新日期:2020-06-05
down
wechat
bug