Standardizing linguistic data: method and tools for annotating (pre-orthographic) French,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Standardizing linguistic data: method and tools for annotating (pre-orthographic) French
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.11074
Simon GabayUNIGE, Thibault ClériceENC, Jean-Baptiste CampsENC, Jean-Baptiste TanguySU, Matthias Gille-LevensonENS Lyon

With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e.g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations. In the present paper, we describe both methodologically (by proposing annotation principles) and technically (by creating the required training data and the relevant models) the production of a linguistic tagger for (early) modern French (16-18th c.), taking as much as possible into account already existing standards for contemporary and, especially, medieval French.

中文翻译：

标准化语言数据：注释（预拼写）法语的方法和工具

随着各个时期大型语料库的发展，标准化语言注释（例如引理，POS标签，形态注释）以提高产生的数据的互操作性变得至关重要，尽管历时差异很大。在本文中，我们从方法上（通过提出注释原则）和技术上（通过创建所需的训练数据及相关模型）描述了（早期）现代法语（16-18世纪）的语言标记的生产，尽可能考虑到当代（尤其是中世纪法语）的现有标准。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>