当前位置: X-MOL 学术Big Data & Society › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A qualitative analysis of sarcasm, irony and related #hashtags on Twitter
Big Data & Society ( IF 6.5 ) Pub Date : 2020-07-01 , DOI: 10.1177/2053951720972735
Martin Sykora , Suzanne Elayan , Thomas W Jackson 1
Affiliation  

As the use of automated social media analysis tools surges, concerns over accuracy of analytics have increased. Some tentative evidence suggests that sarcasm alone could account for as much as a 50% drop in accuracy when automatically detecting sentiment. This paper assesses and outlines the prevalence of sarcastic and ironic language within social media posts. Several past studies proposed models for automatic sarcasm and irony detection for sentiment analysis; however, these approaches result in models trained on training data of highly questionable quality, with little qualitative appreciation of the underlying data. To understand the issues and scale of the problem, we are the first to conduct and present results of a focused manual semantic annotation analysis of two datasets of Twitter messages (in total 4334 tweets), associated with; (i) hashtags commonly employed in automated sarcasm and irony detection approaches, and (ii) tweets relating to 25 distinct events, including, scandals, product releases, cultural events, accidents, terror incidents, etc. We also highlight the contextualised use of multi-word hashtags in the communication of humour, sarcasm and irony, pointing out that many sentiment analysis tools simply fail to recognise such hashtag-based expressions. Our findings also offer indicative evidence regarding the quality of training data used for automated machine learning models in sarcasm, irony and sentiment detection. Worryingly only 15% of tweets labelled as sarcastic were truly sarcastic. We highlight the need for future research studies to rethink their approach to data preparation and a more careful interpretation of sentiment analysis.

中文翻译:

对 Twitter 上的讽刺、讽刺和相关 #hashtags 的定性分析

随着自动化社交媒体分析工具的使用激增,对分析准确性的担忧也增加了。一些初步证据表明,在自动检测情绪时,仅讽刺就可能导致准确率下降 50%。本文评估并概述了社交媒体帖子中讽刺和讽刺语言的流行情况。过去的几项研究提出了用于情感分析的自动讽刺和反讽检测模型;然而,这些方法导致在训练数据上训练的模型质量非常有问题,对基础数据的定性评估很少。为了了解问题的问题和问题的规模,我们首先对两个 Twitter 消息数据集(总共 4334 条推文)进行了重点手动语义注释分析并呈现了结果,相关联;(i) 自动讽刺和讽刺检测方法中常用的主题标签,以及 (ii) 与 25 个不同事件相关的推文,包括丑闻、产品发布、文化事件、事故、恐怖事件等。 - 幽默、讽刺和讽刺交流中的单词主题标签,指出许多情感分析工具根本无法识别这种基于主题标签的表达方式。我们的发现还提供了有关用于讽刺、反讽和情感检测中的自动机器学习模型的训练数据质量的指示性证据。令人担忧的是,只有 15% 被标记为讽刺的推文是真正的讽刺。我们强调未来的研究需要重新思考他们的数据准备方法和对情感分析的更仔细的解释。
更新日期:2020-07-01
down
wechat
bug