NEU at WNUT-2020 Task 2: Data Augmentation To Tell BERT That Death Is Not Necessarily Informative,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

NEU at WNUT-2020 Task 2: Data Augmentation To Tell BERT That Death Is Not Necessarily Informative
arXiv - CS - Information Retrieval Pub Date : 2020-09-18 , DOI: arxiv-2009.08590
Kumud Chauhan

Millions of people around the world are sharing COVID-19 related information on social media platforms. Since not all the information shared on the social media is useful, a machine learning system to identify informative posts can help users in finding relevant information. In this paper, we present a BERT classifier system for W-NUT2020 Shared Task 2: Identification of Informative COVID-19 English Tweets. Further, we show that BERT exploits some easy signals to identify informative tweets, and adding simple patterns to uninformative tweets drastically degrades BERT performance. In particular, simply adding 10 deaths to tweets in dev set, reduces BERT F1- score from 92.63 to 7.28. We also propose a simple data augmentation technique that helps in improving the robustness and generalization ability of the BERT classifier.

中文翻译：

WNUT-2020 任务 2 中的 NEU：数据增强告诉 BERT 死亡不一定提供信息

全世界有数百万人在社交媒体平台上分享与 COVID-19 相关的信息。由于并非社交媒体上共享的所有信息都是有用的，因此识别信息性帖子的机器学习系统可以帮助用户找到相关信息。在本文中，我们为 W-NUT2020 共享任务 2：信息性 COVID-19 英文推文的识别提出了一个 BERT 分类器系统。此外，我们表明 BERT 利用一些简单的信号来识别信息性推文，并且向非信息性推文添加简单的模式会大大降低 BERT 的性能。特别是，简单地在开发集中向推文添加 10 条死亡信息，将 BERT F1- score 从 92.63 降低到 7.28。我们还提出了一种简单的数据增强技术，有助于提高 BERT 分类器的鲁棒性和泛化能力。

更新日期：2020-09-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文