reproducing "ner and pos when nothing is capitalized",arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

reproducing "ner and pos when nothing is capitalized"
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08396
Andreas Kuster, Jakub Filipek, Viswa Virinchi Muppirala

Capitalization is an important feature in many NLP tasks such as Named Entity Recognition (NER) or Part of Speech Tagging (POS). We are trying to reproduce results of paper which shows how to mitigate a significant performance drop when casing is mismatched between training and testing data. In particular we show that lowercasing 50% of the dataset provides the best performance, matching the claims of the original paper. We also show that we got slightly lower performance in almost all experiments we have tried to reproduce, suggesting that there might be some hidden factors impacting our performance. Lastly, we make all of our work available in a public github repository.

中文翻译：

复制“没有大写时的 ner 和 pos”

大写是许多 NLP 任务的重要特征，例如命名实体识别 (NER) 或词性标注 (POS)。我们正在尝试重现论文的结果，该结果显示了当训练和测试数据之间的大小写不匹配时如何减轻显着的性能下降。特别是，我们表明将 50% 的数据集小写可提供最佳性能，与原始论文的主张相匹配。我们还表明，在我们试图重现的几乎所有实验中，我们的性能都略有下降，这表明可能存在一些影响我们性能的隐藏因素。最后，我们在公共 github 存储库中提供我们所有的工作。

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文