当前位置: X-MOL 学术Aut. Control Comp. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Russian-Language Thesauri: Automatic Construction and Application for Natural Language Processing Tasks
Automatic Control and Computer Sciences Pub Date : 2020-03-04 , DOI: 10.3103/s0146411619070149
N. S. Lagutina , K. V. Lagutina , A. S. Adrianov , I. V. Paramonov

Abstract—

The paper overviews the existing digital Russian-language thesauri and the methods of their automatic construction and application. The authors have analyzed the main characteristics of thesauri published in open access for scientific research, evaluated trends of their development, and their effectiveness in solving natural language processing tasks. Statistical and linguistic methods of thesaurus construction that allow automation of their development and reduce the labor costs of expert linguists have been studied. In particular, algorithms for extracting keywords and semantic thesaurus relations of all types have been considered and the quality of the thesauri generated with the use of these tools was assessed. To illustrate features of various methods of constructing thesaurus relations, the authors developed a combined method that fully automatically generates a specialized thesaurus based on a text corpus of a selected domain and several existing linguistic resources. The proposed method was used to conduct experiments on two Russian-language text corpora that represent two different domains: articles on migration and tweets. The resulting thesauri were analyzed by means of an integrated assessment that had been developed by the authors in a previous study and allows one to determine various aspects of the analyzed thesaurus and appraise the quality of the methods of its generation. The analysis revealed the main advantages and disadvantages of various approaches to thesaurus construction and extraction of semantic relations of different types, and also made it possible to identify potential focus areas for future research.


中文翻译:

俄语叙词表:自然语言处理任务的自动构建和应用

摘要-

本文概述了现有的数字俄语叙词表及其自动构建和应用的方法。作者分析了公开用于科学研究的叙词表的主要特征,评估了其发展趋势,以及它们在解决自然语言处理任务方面的有效性。已经研究了叙词表构建的统计方法和语言方法,这些方法可以自动进行开发并减少专家语言学家的人工成本。特别是,已经考虑了用于提取所有类型的关键字和语义词库关系的算法,并评估了使用这些工具生成的词库的质量。为了说明构建同义词库关系的各种方法的特点,作者开发了一种组合方法,该方法可以根据所选领域的文本语料库和几种现有的语言资源,自动生成专门的词库。所提出的方法用于在代表两个不同领域的两个俄文文本语料库上进行实验:关于迁移和推文的文章。作者进行的一项综合评估分析了叙词表,作者在先前的研究中对其进行了综合评估,从而可以确定所分析词库的各个方面并评估其生成方法的质量。分析揭示了各种构建同义词库和提取不同类型的语义关系的方法的主要优点和缺点,也为将来的研究确定潜在的重点领域提供了可能。
更新日期:2020-03-04
down
wechat
bug