Want to Identify, Extract and Normalize Adverse Drug Reactions in Tweets? Use RoBERTa,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Want to Identify, Extract and Normalize Adverse Drug Reactions in Tweets? Use RoBERTa
arXiv - CS - Computation and Language Pub Date : 2020-06-29 , DOI: arxiv-2006.16146
Katikapalli Subramanyam Kalyan, S.Sangeetha

This paper presents our approach for task 2 and task 3 of Social Media Mining for Health (SMM4H) 2020 shared tasks. In task 2, we have to differentiate adverse drug reaction (ADR) tweets from nonADR tweets and is treated as binary classification. Task3 involves extracting ADR mentions and then mapping them to MedDRA codes. Extracting ADR mentions is treated as sequence labeling and normalizing ADR mentions is treated as multi-class classification. Our system is based on pre-trained language model RoBERTa and it achieves a) F1-score of 58% in task2 which is 12% more than the average score b) relaxed F1-score of 70.1% in ADR extraction of task 3 which is 13.7% more than the average score and relaxed F1-score of 35% in ADR extraction + normalization of task3 which is 5.8% more than the average score. Overall, our models achieve promising results in both the tasks with significant improvements over average scores.

中文翻译：

想要识别、提取和标准化推文中的药物不良反应吗？使用 RoBERTa

本文介绍了我们针对健康社交媒体挖掘 (SMM4H) 2020 共享任务的任务 2 和任务 3 的方法。在任务 2 中，我们必须区分药物不良反应 (ADR) 推文和非 ADR 推文，并将其视为二元分类。任务 3 涉及提取 ADR 提及，然后将它们映射到 MedDRA 代码。提取 ADR 提及被视为序列标记，归一化 ADR 提及被视为多类分类。我们的系统基于预训练的语言模型 RoBERTa，它实现了 a) 任务 2 中的 F1 分数为 58%，比平均分数高 12% b) 任务 3 的 ADR 提取中的轻松 F1 分数为 70.1%，即在ADR提取+task3的归一化中，比平均分数和35%的宽松F1分数高13.7%，比平均分数高5.8%。全面的，

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文