当前位置: X-MOL 学术Acta Linguistica Academica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Corpus-oriented lexicographic database for Beserman Udmurt
Acta Linguistica Academica ( IF 0.690 ) Pub Date : 2017-09-01 , DOI: 10.1556/2062.2017.64.3.5
Timofey Arkhangelskiy 1 , Natalia Serdobolskaya 2 , Maria Usacheva 3
Affiliation  

Beserman Udmurt documentation project is a long-term undertaking aimed primarily at collecting lexicographic and corpus data in the field. During our work on the project, we developed a pipeline for collecting, annotating and publishing our data. In this paper, we describe this pipeline and present the online web interface we developed for providing public access to Beserman materials. We use TLex lexicographic software for working on the dictionary and Fieldworks FLEX for annotating the corpus. After the data have been annotated, they are exported to XML and stored in the online web interface, where these two types of data become interconnected and searchable. We propose solutions to challenges that arise in projects of such kind and reflect on various constraints imposed on lexicographic databases being developed in long-term projects aimed at description of underresourced languages. We suggest that the proposed pipeline and the web interface we developed could be employed by similar projects dealing wi...

中文翻译:

Beserman Udmurt的面向语料库的词典数据库

Beserman Udmurt文档项目是一项长期的工作,主要旨在收集该领域的词典和语料库数据。在进行该项目的过程中,我们开发了一个用于收集,注释和发布数据的管道。在本文中,我们描述了该管道,并介绍了我们开发的用于提供公共访问Beserman材料的在线Web界面。我们使用TLex字典软件处理字典,使用Fieldworks FLEX注释语料库。在对数据进行注释之后,它们将导出为XML并存储在在线Web界面中,这两种类型的数据将在此处互连并可以搜索。我们提出了解决此类项目中出现的挑战的解决方案,并考虑了长期项目中开发的词典数据库所施加的各种约束,这些约束旨在描述资源不足的语言。我们建议,我们开发的拟议管道和Web界面可以由处理以下问题的类似项目采用:
更新日期:2017-09-01
down
wechat
bug