当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The emojification of sentiment on social media: Collection and analysis of a longitudinal Twitter sentiment dataset
arXiv - CS - Digital Libraries Pub Date : 2021-08-31 , DOI: arxiv-2108.13898
Wenjie Yin, Rabab Alkhalifa, Arkaitz Zubiaga

Social media, as a means for computer-mediated communication, has been extensively used to study the sentiment expressed by users around events or topics. There is however a gap in the longitudinal study of how sentiment evolved in social media over the years. To fill this gap, we develop TM-Senti, a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets and covering a time period of over seven years. We describe and assess our methodology to put together a large-scale, emoticon- and emoji-based labelled sentiment analysis dataset, along with an analysis of the resulting dataset. Our analysis highlights interesting temporal changes, among others in the increasing use of emojis over emoticons. We publicly release the dataset for further research in tasks including sentiment analysis and text classification of tweets. The dataset can be fully rehydrated including tweet metadata and without missing tweets thanks to the archive of tweets publicly available on the Internet Archive, which the dataset is based on.

中文翻译:

社交媒体情感的表情化:纵向推特情感数据集的收集与分析

社交媒体作为一种以计算机为媒介的通信手段,已被广泛用于研究用户对事件或主题表达的情绪。然而,关于多年来社交媒体中情绪如何演变的纵向研究存在差距。为了填补这一空白,我们开发了 TM-Senti,这是一个新的大规模、远程监督的 Twitter 情感数据集,拥有超过 1.84 亿条推文,涵盖了超过 7 年的时间段。我们描述并评估了我们的方法,以将一个大规模的、基于表情符号和表情符号的标记情感分析数据集以及对结果数据集的分析组合在一起。我们的分析突出了有趣的时间变化,其中包括越来越多地使用表情符号而不是表情符号。我们公开发布数据集,以进一步研究包括情感分析和推文文本分类在内的任务。由于数据集所基于的 Internet Archive 上公开可用的推文存档,数据集可以完全重新水化,包括推文元数据并且不会丢失推文。
更新日期:2021-09-01
down
wechat
bug