Exploiting languages proximity for part-of-speech tagging of three French regional languages,Language Resources and Evaluation

当前位置： X-MOL 学术 › Lang. Resour. Eval. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploiting languages proximity for part-of-speech tagging of three French regional languages
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2019-04-16 , DOI: 10.1007/s10579-019-09463-7
Pierre Magistry , Anne-Laure Ligozat , Sophie Rosset

This paper presents experiments in part-of-speech tagging of low-resource languages. It addresses the case when no labeled data in the targeted language and no parallel corpus are available. We only rely on the proximity of the targeted language to a better-resourced language. We conduct experiments on three French regional languages. We try to exploit this proximity with two main strategies: delexicalization and transposition. The general idea is to learn a model on the (better-resourced) source language, which will then be applied to the (regional) target language. Delexicalization is used to deal with the difference in vocabulary, by creating abstract representations of the data. Transposition consists in modifying the target corpus to be able to use the source models. We compare several methods and propose different strategies to combine them and improve the state-of-the-art of part-of-speech tagging in this difficult scenario.

中文翻译：

利用语言邻近性对三种法语区域语言进行词性标记

本文介绍了低资源语言的词性标记实验。它解决了以下情况：目标语言中没有带标签的数据并且没有平行语料库。我们仅依靠目标语言与资源更好的语言的接近度。我们使用三种法语区域语言进行实验。我们尝试通过两种主要策略来利用这种接近性：词法化和换位。总体思路是在（更好资源）源语言上学习模型，然后将其应用于（区域）目标语言。通过创建数据的抽象表示，将词法化用于处理词汇差异。换位在于修改目标语料库以能够使用源模型。

更新日期：2019-04-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>