Abstract
The predicate-argument structure transparently encoded in dependency-based syntactic representations supports machine translation, question answering, information extraction, etc. The quality of dependency parsing is therefore a crucial issue in natural language processing. In the current paper we discuss the fundamental ideas of the dependency theory and provide an overview of selected dependency-based resources for Polish. Furthermore, we present some state-of-the-art dependency parsing systems whose models can be estimated on correctly annotated data. In the experimental part, we provide an in-depth evaluation of these systems on Polish data. Our results show that graph-based parsers, even those without any neural component, are better suited for Polish than transition-based parsing systems.
8 Acknowledgements
We would like to thank the anonymous reviewers for their valuable comments. The research presented in this paper was founded by SONATA 8 grant no 2014/15/D/HSS/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The computing was performed at Poznań Supercomputing and Networking Center.
References
Ballesteros, M. and J. Nivre. 2012. “MaltOptimizer: An optimization tool for Malt-Parser”. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics Avignon, France: Association for Computational Linguistics. 58–6. <http://www.aclweb.org/anthology/k12-2012>Search in Google Scholar
Bohnet, B. 2010. “Very high accuracy and fast dependency parsing is not a contradiction”. Proceedings of the 23rd International Conference on Computational Linguistics COLING 2010. 89–97.Search in Google Scholar
Buchholz, S. and E. Marsi. 2006. “CoNLL-X shared task on Multilingual Dependency Parsing”. Proceedings of the Tenth Conference on Computational Natural Language Learning New York City. 149–164.10.3115/1596276.1596305Search in Google Scholar
Carreras, X. 2007. “Experiments with a higher-order projective dependency parser”. In Proceedings of the CONLL Shared Task Session of EMNLP-CONLL 2007. 957–61.Search in Google Scholar
Chu, Y. J. and T. H. Liu. 1965. “On the shortest arborescence of a directed graph”. Science Sinica 14. 1396–1400.Search in Google Scholar
Derwojedowa, M. 2011. Składnia liczebników we współczesnym języku polskim. Zarys opisu zależnościowego Warszawa: Wydawnictwo Wydziału Polonistyki UW.Search in Google Scholar
Diestel, R. 2000. Graph theoryGraduate Texts in Mathematics 173.) New York: Springer-Verlag.Search in Google Scholar
Dozat, T. and C. D. Manning. 2018. “Simpler but more accurate semantic dependency parsing”. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers. Melbourne: Association for Computational Linguistics. 484–490. <http://aclweb.org/anthology/P18-2077>10.18653/v1/P18-2077Search in Google Scholar
Dozat, T., P. Qi and C. D. Manning. 2017. “Stanford’s graph-based neural dependency parser at the CoNLL 2017 Shared Task”. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Association for Computational Linguistics. 20–30. <http://www.aclweb.org/anthology/K/K17/K17-3002dpdf>10.18653/v1/K17-3002Search in Google Scholar
Edmonds, J. 1967. “Optimum branchings”. Journal of Research of the National Bureau of Standards 71B(4). 233–240.10.6028/jres.071B.032Search in Google Scholar
Eisner, J. M. 1996. “Three new probabilistic models for dependency parsing: An exploration”. Proceedings of the 16th International Conference on Computational Linguistics COLING 1996. 340–345.10.3115/992628.992688Search in Google Scholar
Fan, R., K.-W. Chang, C.-J. Hsieh, X. -Rui Wang and C.-J. Lin. 2008. “LIBLINEAR: A library for large linear classification”. Journal of Machine Learning Research 9. 1871–1874.Search in Google Scholar
Kaplan, R. M., J. T. Maxwell III, T. H. King and R. Crouch. 2004. “Integrating finite-state technology with deep LFG grammars”. Proceedings of the Workshop on Combining Shallow and Deep Processing for NLP 11–20.Search in Google Scholar
Kiperwasser, E. and Y. Goldberg. 2016. “Simple and accurate dependency parsing using bidirectional LSTM feature representations”. Transactions of the Association for Computational Linguistics 4. 313–327. <http://aclweb.org/anthology/Q16-1023>10.1162/tacl_a_00101Search in Google Scholar
Klemensiewicz, Z. 1968. Zarys składni polskiej Warszawa: PWN.Search in Google Scholar
Kobyliński, Ł., M. Wasiluk and G. Wojdyga. 2018. “Improving part-of-speech tagging by meta-learning”. Proceedings of the 21st International Conference on Text, Speech and Dialogue (TSD 2018). Brno: Springer, Cham. 144–152. <https://doi.org/https://doi.org/10.1007/978-3-030-00794-S_15>Search in Google Scholar
Koehn, P. 2005. “Europarl: A parallel corpus for statistical machine translation”. Proceedings of the 10th Machine Translation Summit Conference Phuket. 79–86.Search in Google Scholar
Kübler, S., R. T. McDonald and J. Nivre. 2009. Dependency parsing. Synthesis lectures on human language technologies Morgan & Claypool Publishers.10.2200/S00169ED1V01Y200901HLT002Search in Google Scholar
Marcińczuk, M. 2017. “Lemmatization of multi-word common noun phrases and named entities in Polish”. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2017). Varna. 483–491. <https://doi.org/10.26615/978-954-452-049-6_064>Search in Google Scholar
McDonald, R., K. Crammer and F. Pereira. 2005. “Online large-margin training of dependency parsers”. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics ACL 2005. 91–98.10.3115/1219840.1219852Search in Google Scholar
McDonald, R. and F. Pereira. 2006. “Online learning of approximate dependency parsing algorithms”. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics EACL 2006. 81–88.Search in Google Scholar
Mel’čuk, I. A. 1988. Dependency syntax: theory and practice Albany: SUNY Press.Search in Google Scholar
Mikolov, T., K. Chen, G. Corrado and J. Dean. 2013. “Efficient estimation of word representations in vector space”. CoRR abs/1301.3781. <http://arxiv.org/abs/1301.3781>Search in Google Scholar
Newman, M. E. J. 2010. Networks: An introduction New York: Oxford University Press.10.1093/acprof:oso/9780199206650.001.0001Search in Google Scholar
Nivre, J. 2008. “Algorithms for deterministic incremental dependency parsing”. Computational Linguistics 34(4). 513–553.10.1162/coli.07-056-R1-07-027Search in Google Scholar
Nivre, J. 2009. “Non-projective dependency parsing in expected linear time”. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP Singapore. 351–59.10.3115/1687878.1687929Search in Google Scholar
Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel and D. Yuret. 2007. “The CoNLL 2007 Shared Task on Dependency Parsing”. Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007 Prague. 915–932.Search in Google Scholar
Nivre, J., J. Hall and J. Nilsson. 2006. “MaltParser: A data-driven parser-generator for dependency parsing”. Proceedings of the Fifth International Conference on Language Resources and Evaluation LREC’06. 2216–2219.Search in Google Scholar
Nivre, J., M.-C. de Marneffe, F. Ginter, Y. Goldberg, J. Hajič, C. D. Manning, R. T. McDonald, et al. 2016. “Universal dependencies v1: A multilingual treebank collection”. Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016. 1659–1666. <http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf>Search in Google Scholar
Nivre, J. and J. Nilsson. 2005. “Pseudo-projective dependency parsing”. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics ACL ’05. Ann Arbor, MI: Association for Computational Linguistics. 99–106. <http://www.aclweb.org/anthology/P05-1013>10.3115/1219840.1219853Search in Google Scholar
Obrębski, T. 2002. Automatyczna analiza składniowa języka polskiego z wykorzystaniem gramatyki zależnościowej. (PhD dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw.)Search in Google Scholar
Patejuk, A. and A. Przepiórkowski. 2014. “Synergistic development of grammatical resources: A valence dictionary, an LFG grammar and an LFG structure bank for Polish”. Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT 13). Tübingen: Department of Linguistics (SfS), University of Tübingen. 113–126.Search in Google Scholar
Pęzik, P., M. Ogrodniczuk and A. Przepiórkowski. 2011. “Parallel and spoken corpora in an open repository of Polish language resources”. Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań. 511–515.Search in Google Scholar
Polguére, A. and I. A. Mel’čuk, eds. 2009. Dependency in linguistic descriptionStudies in Language Companion Series (SLCS) 111.) Amsterdam: Benjamins.10.1075/slcs.111Search in Google Scholar
Przepiórkowski, A., M. Bańko, R. L. Górski and B. Lewandowska-Tomaszczyk (eds.). 2012. Narodowy Korpus Języka Polskiego [The National Corpus of Polish]. Warsaw: Wydawnictwo Naukowe PWN.Search in Google Scholar
Przepiórkowski, A. and A. Wróblewska. 2015. “Supporting LFG parsing with dependency parsing”. Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT 14). Warsaw: Institute of Computer Science, Polish Academy of Sciences. 168–178.Search in Google Scholar
Rybak, P. and A. Wróblewska. 2018. “Semi-supervised neural system for tagging, parsing and lematization”. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Brussels, Belgium: Association for Computational Linguistics. 45–54. <https://doi.org/10.18653/v1/K18-2004>Search in Google Scholar
Seddah, D., S. Kübler and R. Tsarfaty. 2014. “Introducing the SPMRL 2014 Shared Task on parsing morphologically-rich languages”. Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages. Dublin City University. 103–109. <http://www.aclweb.org/anthology/W14-6111>Search in Google Scholar
Seddah, D., R. Tsarfaty, S. Kübler, M. Candito, J.D. Choi, R. Farkas, J. Foster, et al. 2013. “Overview of the SPMRL 2013 Shared Task: A cross-framework evaluation of parsing morphologically rich languages”. Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages Association for Computational Linguistics. 146–182. <http://www.aclweb.org/anthology/W13-4917>Search in Google Scholar
Sgall, P., E. Hajičová and J. Panevová. 1986. The meaning of the sentence in its semantic and pragmatic aspects Dordrecht: Reidel.Search in Google Scholar
Steinberger, R., A. Eisele, S. Klocek, S. Pilos and P. Schlüter. 2012. “DGT-TM: A freely available translation memory in 22 languages”. Proceedings of the 8th International Conference on Language Resources and Evaluation Istanbul. 454–459.Search in Google Scholar
Straka, M. and J. Straková. 2017. “Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe”. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Association for Computational Linguistics. 88–99. <http://www.aclweb.org/anthology/K/K17/K17-3009dpdf>10.18653/v1/K17-3009Search in Google Scholar
Świdziński, M. 1989. “A dependency syntax of Polish”. In: Maxwell, D. and K. Schubert (eds.), Metataxis in practice. Dependency syntax for multilingual machine translation Dordrecht: Foris. 69–88.10.1515/9783110874174.69Search in Google Scholar
Tiedemann, J. 2012. “Parallel data, tools and interfaces in OPUS”. Proceedings of the 8th International Conference on Language Resources and Evaluation Istanbul. 2214–2218.Search in Google Scholar
Woliński, M. 2015. “Deploying the new valency dictionary Walenty in a DCG parser of Polish”. Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT 14). Warsaw: Institute of Computer Science, Polish Academy of Sciences. 221–29. <http://tlt14dipipan.waw.pl/proceedings/>Search in Google Scholar
Woliński, M. 2019. Automatyczna analiza składnikowa języka polskiego Warsaw: Wydawnictwa Uniwersytetu Warszawskiego.10.31338/uw.9788323536147Search in Google Scholar
Woliński, M., K. Głowińska and M. Świdziński. 2011. “A preliminary version of Składnica Treebank of Polish”. Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics 299–303.Search in Google Scholar
Wróblewska, A. 2012. “Polish dependency bank”. Linguistic Issues in Language Technology 7(1). 1–15.10.33011/lilt.v7i.1261Search in Google Scholar
Wróblewska, A. 2014. Polish dependency parser trained on an automatically induced dependency bank. (PhD dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw.)Search in Google Scholar
Wróblewska, A. 2018. “Extended and enhanced Polish dependency bank in universal dependencies format”. Proceedings of the Second Workshop on Universal Dependencies (UDW 2018). Brussels: Association for Computational Linguistics. 173–182. <https://aclanthology.coli.uni-saarland.de/papers/W18-6020/w18-6020>10.18653/v1/W18-6020Search in Google Scholar
Wróblewska, A. 2018. “Results of the PolEval 2018 Competition: Dependency parsing shared task”. Proceedings of the PolEval 2018 Workshop. Institute of Computer Science, Polish Academy of Sciences. 11–24.Search in Google Scholar
Wróblewska, A. and K. Krasnowska-Kieraś. 2017. “Polish evaluation dataset for compositional distributional semantics models”. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 784–792.10.18653/v1/P17-1073Search in Google Scholar
Wróblewska, A., K. Krasnowska-Kieraś and P. Rybak. 2017. “Towards the evaluation of feature embedding models of the fusional languages”. Proceedings of the 8th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. Poznań: Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu. 420–424. <http://ltc.amu.edu.pl/book/papers/SEMS-3dpdf>Search in Google Scholar
Zeman, D., O. Dušek, D. Mareček, M. Popel, L. Ramasamy, J. Štěpánek, Z. Žabokrtský and J. Hajič. 2014. “HamleDT: Harmonized multi-language dependency treebank”. Language Resources and Evaluation 48(4). 601–637.10.1007/s10579-014-9275-2Search in Google Scholar
Zeman, D., J. Hajič, M. Popel, M. Potthast, M. Straka, F. Ginter, J. Nivre and S. Petrov. 2018. “CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies”. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Brussels: Association for Computational Linguistics. 1–21. <http://www.aclweb.org/anthology/K18-2001>Search in Google Scholar
Zeman, D., M. Popel, M. Straka, J. Hajič, J. Nivre, F. Ginter, J. Luotolahti, et al. 2017. “CoNLL 2017 Shared Task: Multilingual parsing from raw text to universal dependencies”. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Vancouver, Canada: Association for Computational Linguistics. 1–19. <https://doi.org/10.18653/1U/K17-3001>10.18653/v1/K17-3Search in Google Scholar
© 2019 Faculty of English, Adam Mickiewicz University, Poznań, Poland