当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Standardizing linguistic data: method and tools for annotating (pre-orthographic) French
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.11074 Simon GabayUNIGE, Thibault ClériceENC, Jean-Baptiste CampsENC, Jean-Baptiste TanguySU, Matthias Gille-LevensonENS Lyon
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.11074 Simon GabayUNIGE, Thibault ClériceENC, Jean-Baptiste CampsENC, Jean-Baptiste TanguySU, Matthias Gille-LevensonENS Lyon
With the development of big corpora of various periods, it becomes crucial to
standardise linguistic annotation (e.g. lemmas, POS tags, morphological
annotation) to increase the interoperability of the data produced, despite
diachronic variations. In the present paper, we describe both methodologically
(by proposing annotation principles) and technically (by creating the required
training data and the relevant models) the production of a linguistic tagger
for (early) modern French (16-18th c.), taking as much as possible into account
already existing standards for contemporary and, especially, medieval French.
中文翻译:
标准化语言数据:注释(预拼写)法语的方法和工具
随着各个时期大型语料库的发展,标准化语言注释(例如引理,POS标签,形态注释)以提高产生的数据的互操作性变得至关重要,尽管历时差异很大。在本文中,我们从方法上(通过提出注释原则)和技术上(通过创建所需的训练数据及相关模型)描述了(早期)现代法语(16-18世纪)的语言标记的生产,尽可能考虑到当代(尤其是中世纪法语)的现有标准。
更新日期:2020-11-25
中文翻译:
标准化语言数据:注释(预拼写)法语的方法和工具
随着各个时期大型语料库的发展,标准化语言注释(例如引理,POS标签,形态注释)以提高产生的数据的互操作性变得至关重要,尽管历时差异很大。在本文中,我们从方法上(通过提出注释原则)和技术上(通过创建所需的训练数据及相关模型)描述了(早期)现代法语(16-18世纪)的语言标记的生产,尽可能考虑到当代(尤其是中世纪法语)的现有标准。