EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets,arXiv - CS - Social and Information Networks

当前位置： X-MOL 学术 › arXiv.cs.SI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
arXiv - CS - Social and Information Networks Pub Date : 2020-09-06 , DOI: arxiv-2009.06375
Nickil Maveli

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they're observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (disaster relief organizations and news agencies) and therefore recognizing the informativeness of a tweet can help filter noise from large volumes of data. In this paper, we present our submission for WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers including RoBERTa, XLNet, and BERTweet trained in a semi-supervised experimental setting. The proposed system achieves a F1 score of 0.9011 on the test set (ranking 7th on the leaderboard), and shows significant gains in performance compared to a baseline system using fasttext embeddings.

中文翻译：

EdinburghNLP 在 WNUT-2020 任务 2：利用具有广义增强的 Transformer 识别 COVID-19 推文中的信息量

Twitter 已成为紧急情况下的重要沟通渠道。智能手机无处不在，人们可以实时宣布他们正在观察的紧急情况。正因为如此，越来越多的机构对以编程方式监控 Twitter（救灾组织和新闻机构）感兴趣，因此识别推文的信息量有助于过滤大量数据中的噪音。在本文中，我们展示了我们提交的 WNUT-2020 任务 2：识别信息丰富的 COVID-19 英文推文。我们最成功的模型是一组 Transformer，包括在半监督实验环境中训练的 RoBERTa、XLNet 和 BERTweet。所提出的系统在测试集上的 F1 分数为 0.9011（在排行榜上排名第 7），

更新日期：2020-10-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>