当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The futility of STILTs for the classification of lexical borrowings in Spanish
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08607
Javier de la Rosa

The first edition of the IberLEF 2021 shared task on automatic detection of borrowings (ADoBo) focused on detecting lexical borrowings that appeared in the Spanish press and that have recently been imported into the Spanish language. In this work, we tested supplementary training on intermediate labeled-data tasks (STILTs) from part of speech (POS), named entity recognition (NER), code-switching, and language identification approaches to the classification of borrowings at the token level using existing pre-trained transformer-based language models. Our extensive experimental results suggest that STILTs do not provide any improvement over direct fine-tuning of multilingual models. However, multilingual models trained on small subsets of languages perform reasonably better than multilingual BERT but not as good as multilingual RoBERTa for the given dataset.

中文翻译:

STILT 对西班牙语词汇借用分类的无用

IberLEF 2021 第一版共享自动检测借用任务 (ADoBo) 的重点是检测出现在西班牙媒体中且最近已导入西班牙语的词汇借用。在这项工作中,我们测试了对来自词性 (POS)、命名实体识别 (NER)、代码切换和语言识别方法的中间标记数据任务 (STILT) 的补充训练,以使用现有的预训练的基于转换器的语言模型。我们广泛的实验结果表明,与直接微调多语言模型相比,STILT 没有提供任何改进。然而,
更新日期:2021-09-20
down
wechat
bug