Boosting Transformers for Job Expression Extraction and Classification in a Low-Resource Setting,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Boosting Transformers for Job Expression Extraction and Classification in a Low-Resource Setting
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08597
Lukas Lange, Heike Adel, Jannik Strötgen

In this paper, we explore possible improvements of transformer models in a low-resource setting. In particular, we present our approaches to tackle the first two of three subtasks of the MEDDOPROF competition, i.e., the extraction and classification of job expressions in Spanish clinical texts. As neither language nor domain experts, we experiment with the multilingual XLM-R transformer model and tackle these low-resource information extraction tasks as sequence-labeling problems. We explore domain- and language-adaptive pretraining, transfer learning and strategic datasplits to boost the transformer model. Our results show strong improvements using these methods by up to 5.3 F1 points compared to a fine-tuned XLM-R model. Our best models achieve 83.2 and 79.3 F1 for the first two tasks, respectively.

中文翻译：

在低资源环境中提升用于作业表达式提取和分类的变换器

在本文中，我们探讨了在低资源环境中变压器模型的可能改进。特别是，我们提出了解决 MEDDOPROF 竞赛三个子任务中前两个子任务的方法，即西班牙语临床文本中工作表达的提取和分类。作为语言专家和领域专家，我们尝试使用多语言 XLM-R 转换器模型，并将这些低资源信息提取任务作为序列标记问题进行处理。我们探索了领域和语言自适应预训练、迁移学习和战略数据拆分，以提升 Transformer 模型。我们的结果表明，与微调的 XLM-R 模型相比，使用这些方法的性能提高了 5.3 个 F1 点。我们最好的模型在前两个任务中分别达到了 83.2 和 79.3 F1。

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>