Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging
arXiv - CS - Computation and Language Pub Date : 2020-09-20 , DOI: arxiv-2009.09474
Ehsan Doostmohammadi, Minoo Nassajian, Adel Rahimi

Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformer-based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F1-score more than the previous state-of-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case.

中文翻译：

使用 Transformer 的波斯语 Ezafe 识别及其在词性标注中的作用

Ezafe 是一些伊朗语言中的语法助词，将两个词连接在一起。不管它传达什么重要信息，它几乎总是没有用波斯文字表示，从而导致在阅读复杂句子时出错，在自然语言处理任务中出错。在本文中，我们尝试了不同的机器学习方法，以在 ezaf 识别任务中获得最先进的结果。基于 Transformer 的方法 BERT 和 XLMRoBERTa 取得了最好的结果，后者的 F1 分数比之前的最新技术高出 2.68%。此外，我们使用 ezaf 信息来改进波斯语词性标注结果，并表明此类信息对基于转换器的方法没有用处，并解释了为什么会出现这种情况。

更新日期：2020-10-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文