当前位置: X-MOL 学术J. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing the Utility of Different Classification Schemes for Emotive Language Analysis
Journal of Classification ( IF 2 ) Pub Date : 2019-05-10 , DOI: 10.1007/s00357-019-9307-0
Lowri Williams , Michael Arribas-Ayllon , Andreas Artemiou , Irena Spasić

In this paper we investigated the utility of different classification schemes for emotive language analysis with the aim of providing experimental justification for the choice of scheme for classifying emotions in free text. We compared six schemes: (1) Ekman's six basic emotions, (2) Plutchik's wheel of emotion, (3) Watson and Tellegen's Circumplex theory of affect, (4) the Emotion Annotation Representation Language (EARL), (5) WordNet–Affect, and (6) free text. To measure their utility, we investigated their ease of use by human annotators as well as the performance of supervised machine learning. We assembled a corpus of 500 emotionally charged text documents. The corpus was annotated manually using an online crowdsourcing platform with five independent annotators per document. Assuming that classification schemes with a better balance between completeness and complexity are easier to interpret and use, we expect such schemes to be associated with higher inter–annotator agreement. We used Krippendorff's alpha coefficient to measure inter–annotator agreement according to which the six classification schemes were ranked as follows: (1) six basic emotions (α = 0.483), (2) wheel of emotion (α = 0.410), (3) Circumplex (α = 0.312), EARL (α = 0.286), (5) free text (α = 0.205), and (6) WordNet–Affect (α = 0.202). However, correspondence analysis of annotations across the schemes highlighted that basic emotions are oversimplified representations of complex phenomena and as such likely to lead to invalid interpretations, which are not necessarily reflected by high inter-annotator agreement. To complement the result of the quantitative analysis, we used semi–structured interviews to gain a qualitative insight into how annotators interacted with and interpreted the chosen schemes. The size of the classification scheme was highlighted as a significant factor affecting annotation. In particular, the scheme of six basic emotions was perceived as having insufficient coverage of the emotion space forcing annotators to often resort to inferior alternatives, e.g. using happiness as a surrogate for love. On the opposite end of the spectrum, large schemes such as WordNet–Affect were linked to choice fatigue, which incurred significant cognitive effort in choosing the best annotation. In the second part of the study, we used the annotated corpus to create six training datasets, one for each scheme. The training data were used in cross–validation experiments to evaluate classification performance in relation to different schemes. According to the F-measure, the classification schemes were ranked as follows: (1) six basic emotions (F = 0.410), (2) Circumplex (F = 0.341), (3) wheel of emotion (F = 0.293), (4) EARL (F = 0.254), (5) free text (F = 0.159) and (6) WordNet–Affect (F = 0.158). Not surprisingly, the smallest scheme was ranked the highest in both criteria. Therefore, out of the six schemes studied here, six basic emotions are best suited for emotive language analysis. However, both quantitative and qualitative analysis highlighted its major shortcoming – oversimplification of positive emotions, which are all conflated into happiness. Further investigation is needed into ways of better balancing positive and negative emotions.

中文翻译:

比较不同分类方案在情感语言分析中的效用

在本文中,我们研究了不同分类方案在情感语言分析中的效用,目的是为自由文本中情感分类方案的选择提供实验依据。我们比较了六种方案:(1)Ekman 的六种基本情绪,(2)Plutchik 的情绪轮,(3)Watson 和 Tellegen 的 Circumplex 情感理论,(4)情感标注表示语言(EARL),(5)WordNet–Affect , 和 (6) 自由文本。为了衡量它们的效用,我们调查了人工注释者对它们的易用性以及监督机器学习的性能。我们收集了 500 个充满情感的文本文档的语料库。语料库是使用在线众包平台手动注释的,每个文档有五个独立的注释者。假设在完整性和复杂性之间具有更好平衡的分类方案更容易解释和使用,我们希望这些方案与更高的注释者间一致性相关联。我们使用 Krippendorff 的 alpha 系数来衡量注释者间的一致性,根据六种分类方案的排序如下:(1)六种基本情绪(α = 0.483),(2)情绪之轮(α = 0.410),(3) Circumplex (α = 0.312)、EARL (α = 0.286)、(5) 自由文本 (α = 0.205) 和 (6) WordNet–Affect (α = 0.202)。然而,跨方案注释的对应分析强调,基本情绪是复杂现象的过度简化表示,因此可能导致无效解释,这不一定反映在高注释者间一致性上。为了补充定量分析的结果,我们使用半结构化访谈来定性地了解注释者如何与所选方案进行交互和解释。分类方案的大小被强调为影响注释的重要因素。特别是,六种基本情绪的方案被认为对情绪空间的覆盖不足,迫使注释者经常求助于较差的替代方案,例如使用幸福作为爱的替代品。在光谱的另一端,诸如 WordNet–Affect 之类的大型方案与选择疲劳有关,这在选择最佳注释时需要付出大量的认知努力。在研究的第二部分,我们使用带注释的语料库创建了六个训练数据集,每个方案一个。训练数据用于交叉验证实验,以评估与不同方案相关的分类性能。根据F-measure,分类方案排列如下:(1)六种基本情绪(F = 0.410),(2)Circumplex(F = 0.341),(3)情绪之轮(F = 0.293),( 4) EARL (F = 0.254),(5) 自由文本 (F = 0.159) 和 (6) WordNet–Affect (F = 0.158)。毫不奇怪,最小的计划在这两个标准中都排名最高。因此,在这里研究的六种方案中,六种基本情绪最适合情感语言分析。然而,定量和定性分析都强调了它的主要缺点——将积极情绪过于简单化,而这些积极情绪都被混为一谈。需要进一步研究更好地平衡积极和消极情绪的方法。
更新日期:2019-05-10
down
wechat
bug