Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing,Computational Linguistics

当前位置： X-MOL 学术 › Comput. Linguist. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Computational Linguistics ( IF 9.3 ) Pub Date : 2019-09-01 , DOI: 10.1162/coli_a_00357
Edoardo Maria Ponti ₁ , Helen O’Horan ₁ , Yevgeni Berzak ₂ , Ivan Vulić ₁ , Roi Reichart ₃ , Thierry Poibeau ₄ , Ekaterina Shutova ₅ , Anna Korhonen ₁

Affiliation

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge.

中文翻译：

建模语言变化和通用性：自然语言处理的类型语言学调查

语言类型学旨在捕捉世界语言的结构和语义变化。大规模类型学可以为多语言自然语言处理 (NLP) 提供极好的指导，特别是对于缺乏人工标记资源的语言。我们对类型学信息在 NLP 技术开发中的使用进行了广泛的文献调查。我们的调查表明，迄今为止，使用现有类型数据库中的信息已导致系统性能持续但适度的改进。我们表明，这是由于数据库的内在限制（在覆盖范围和特征粒度方面）和其中包含的类型学特征未充分利用。我们主张采用一种新方法，使类型学类别的广泛和离散性质适应当代 NLP 中使用的机器学习算法的上下文和连续性质。特别是，我们建议这种方法可以通过数据驱动的类型学知识归纳的最新发展来促进。

更新日期：2019-09-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>