当前位置: X-MOL 学术Poznan Studies in Contemporary Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Weakly supervised word sense disambiguation for Polish using rich lexical resources
Poznan Studies in Contemporary Linguistics ( IF 0.5 ) Pub Date : 2019-06-26 , DOI: 10.1515/psicl-2019-0013
Arkadiusz Janz , Maciej Piasecki

Abstract Automatic word sense disambiguation (WSD) has proven to be an important technique in many natural language processing tasks. For many years the problem of sense disambiguation has been approached with a wide range of methods, however, it is still a challenging problem, especially in the unsupervised setting. One of the well-known and successful approaches to WSD are knowledge-based methods leveraging lexical knowledge resources such as wordnets. As the knowledge-based approaches mostly do not use any labelled training data their performance strongly relies on the structure and the quality of used knowledge sources. However, a pure knowledge-base such as a wordnet cannot reflect all the semantic knowledge necessary to correctly disambiguate word senses in text. In this paper we explore various expansions to plWordNet as knowledge-bases for WSD. Semantic links extracted from a large valency lexicon (Walenty), glosses and usage examples, Wikipedia articles and SUMO ontology are combined with plWordNet and tested in a PageRank-based WSD algorithm. In addition, we analyse also the influence of lexical semantics vector models extracted with the help of the distributional semantics methods. Several new Polish test data sets for WSD are also introduced. All the resources, methods and tools are available on open licences.

中文翻译:

使用丰富的词汇资源的波兰语弱监督词义消歧

摘要自动词义消歧(WSD)已被证明是许多自然语言处理任务中的一项重要技术。多年来,人们已经通过多种方法解决了歧义消除的问题,但是,这仍然是一个具有挑战性的问题,尤其是在无人监督的情况下。WSD的一种著名且成功的方法是利用词汇知识资源(如词网)的基于知识的方法。由于基于知识的方法大多不使用任何标记的培训数据,因此其性能在很大程度上取决于所用知识源的结构和质量。但是,纯粹的知识库(例如词网)无法反映正确消除文本中词义歧义所必需的所有语义知识。在本文中,我们探索plWordNet作为WSD知识库的各种扩展。从大量价词典(Walenty)中提取的语义链接,词汇和用法示例,维基百科文章和SUMO本体与plWordNet结合在一起,并在基于PageRank的WSD算法中进行了测试。此外,我们还分析了借助分布语义方法提取的词汇语义向量模型的影响。还介绍了一些针对WSD的新波兰测试数据集。所有资源,方法和工具都可以通过公开许可获得。我们还分析了借助分布语义方法提取的词汇语义矢量模型的影响。还介绍了一些针对WSD的新波兰测试数据集。所有资源,方法和工具都可以通过公开许可获得。我们还分析了借助分布语义方法提取的词汇语义矢量模型的影响。还介绍了一些针对WSD的新波兰测试数据集。所有资源,方法和工具都可以通过公开许可获得。
更新日期:2019-06-26
down
wechat
bug