COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes
arXiv - CS - Information Retrieval Pub Date : 2020-07-14 , DOI: arxiv-2007.06954
Raj Kumar Gupta, Ajay Vishwanath, Yinping Yang

This paper presents a large annotated dataset on public expressions related to the COVID-19 pandemic. Through Twitter's standard search application programming interface, we retrieved over 63 million coronavirus-related public posts from more than 13 million unique users since 28 January to 1 July 2020. Using natural language processing techniques and machine learning based algorithms, we annotated each public tweet with seventeen latent semantic attributes, including: 1) ten binary attributes indicating the tweet's relevance or irrelevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity of the valence or sentiment (from extremely negative to extremely positive), and the degree of intensity of fear, of anger, of sadness and of joy emotions (from extremely low intensity to extremely high intensity), and 3) two qualitative attributes indicating the sentiment category and the dominant emotion category, respectively. We report basic descriptive statistics around the topics, sentiments and emotions attributes and their temporal distributions, and discuss its possible usage in communication, psychology, public health, economics and epidemiology research.

中文翻译：

具有潜在主题、情绪和情绪属性的 COVID-19 Twitter 数据集

本文介绍了与 COVID-19 大流行相关的公共表达的大型注释数据集。自 2020 年 1 月 28 日至 7 月 1 日，通过 Twitter 的标准搜索应用程序编程接口，我们从超过 1300 万独立用户中检索了超过 6300 万条与冠状病毒相关的公开帖子。我们使用自然语言处理技术和基于机器学习的算法，对每条公开推文进行了注释17 个潜在语义属性，包括：1) 10 个二元属性，表示推文与 10 个检测到的主题的相关性或不相关性，2) 5 个定量属性，表示效价或情绪的强度程度（从极消极到极积极），以及程度恐惧、愤怒的强度，悲伤和喜悦情绪（从极低强度到极高强度），以及 3）两个定性属性，分别表示情绪类别和主导情绪类别。我们报告围绕主题、情绪和情绪属性及其时间分布的基本描述性统计，并讨论其在传播、心理学、公共卫生、经济学和流行病学研究中的可能用途。

更新日期：2020-09-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文