当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Składnica : a constituency treebank of Polish harmonised with the Walenty valency dictionary
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2021-02-21 , DOI: 10.1007/s10579-020-09511-7
Marcin Woliński , Elżbieta Hajnicz

This paper reports on the developments in three interrelated linguistic resources for Polish. The first is Świgra 2—a rule based constituency parser for Polish. The second is Składnica—a treebank built using Świgra 2. The third resource is valency dictionary Walenty, which became available when the work on the first two was already advanced. However, since the dictionary is much more comprehensive than the ad-hoc dictionary used previously with Świgra, a decision was made to switch the parser and the treebank to the new dictionary. The switch required several modifications to the Świgra 2 parser, including implementation of unlike coordination, introducing semantically motivated phrases, and non-standard case values. A semi-automated procedure to upgrade previously disambiguated trees in Składnica was required as well. Modifications introduced in the treebank during the upgrade included systematic changes of notation and resolving newly introduced ambiguities resulting from the use of the more detailed distinctions made in the dictionary. The procedure for confronting Składnica with the trees generated with the new version of the Świgra 2 parser using the Walenty dictionary allowed us to check all of these resources for consistency. This resulted in several corrections being introduced in both the treebank and the valency dictionary.



中文翻译:

斯瓦德尼察(Składnica):波兰语的选区树库,与瓦伦蒂化合价字典协调

本文报告了波兰三种相互关联的语言资源的发展情况。第一个是Świgra2-波兰语的基于规则的选区解析器。第二个是Składnica,这是使用Świgra2构建的树库。第三个资源是化合价字典Walenty,当前两个工作已经进行时,该字典就可以使用。但是,由于该词典比以前与Świgra一起使用的即席词典要全面得多,因此决定将解析器和树库切换到新词典。切换要求对Świgra2解析器进行一些修改,包括实现不同的协调,引入语义动机的短语以及非标准的大小写值。还需要一个半自动化的过程来升级Składnica中先前已消歧的树木。升级期间在树库中引入的修改包括符号的系统更改和解决了由于使用字典中更详细的区别而导致的新引入的歧义。使用Walenty词典,使用新版本的Świgra2解析器生成的树与Składnica面对的过程使我们能够检查所有这些资源的一致性。这导致在树库和化合价字典中都引入了一些更正。使用Walenty词典,使用新版本的Świgra2解析器生成的树与Składnica面对的过程使我们能够检查所有这些资源的一致性。这导致在树库和化合价字典中都引入了一些更正。使用Walenty词典使用新版本的Świgra2解析器生成的树与Składnica面对的过程使我们能够检查所有这些资源的一致性。这导致在树库和化合价字典中都引入了一些更正。

更新日期:2021-02-21
down
wechat
bug