A systematic review of unsupervised approaches to grammar induction,Natural Language Engineering

当前位置： X-MOL 学术 › Nat. Lang. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A systematic review of unsupervised approaches to grammar induction
Natural Language Engineering ( IF 2.3 ) Pub Date : 2020-10-27 , DOI: 10.1017/s1351324920000327
Vigneshwaran Muralidaran , Irena Spasić , Dawn Knight

This study systematically reviews existing approaches to unsupervised grammar induction in terms of their theoretical underpinnings, practical implementations and evaluation. Our motivation is to identify the influence of functional-cognitive schools of grammar on language processing models in computational linguistics. This is an effort to fill any gap between the theoretical school and the computational processing models of grammar induction. Specifically, the review aims to answer the following research questions: Which types of grammar theories have been the subjects of grammar induction? Which methods have been employed to support grammar induction? Which features have been used by these methods for learning? How were these methods evaluated? Finally, in terms of performance, how do these methods compare to one another? Forty-three studies were identified for systematic review out of which 33 described original implementations of grammar induction; three provided surveys and seven focused on theories and experiments related to acquisition and processing of grammar in humans. The data extracted from the 33 implementations were stratified into 7 different aspects of analysis: theory of grammar; output representation; how grammatical productivity is processed; how grammatical productivity is represented; features used for learning; evaluation strategy and implementation methodology. In most of the implementations considered, grammar was treated as a generative-formal system, autonomous and independent of meaning. The parser decoding was done in a non-incremental, head-driven fashion by assuming that all words are available for the parsing model and the output representation of the grammar learnt was hierarchical, typically a dependency or a constituency tree. However, the theoretical and experimental studies considered suggest that a usage-based, incremental, sequential system of grammar is more appropriate than the formal, non-incremental, hierarchical view of grammar. This gap between the theoretical as well as experimental studies on one hand and the computational implementations on the other hand should be addressed to enable further progress in computational grammar induction research.

中文翻译：

对语法归纳的无监督方法的系统评价

本研究从理论基础、实际实施和评估方面系统地回顾了现有的无监督语法归纳方法。我们的动机是确定功能认知语法学校对计算语言学中语言处理模型的影响。这是为了填补理论学派和语法归纳的计算处理模型之间的任何空白。具体而言，该综述旨在回答以下研究问题：哪些类型的语法理论一直是语法归纳的主题？采用了哪些方法来支持语法归纳？这些方法使用了哪些特征进行学习？这些方法是如何评估的？最后，在性能方面，这些方法如何相互比较？确定了 43 项研究进行系统评价，其中 33 项描述了语法归纳的原始实施；三份提供了调查，七份侧重于与人类语法习得和处理相关的理论和实验。从 33 个实现中提取的数据被分为 7 个不同的分析方面：语法理论；输出表示；如何处理语法生产力；如何表示语法生产力；用于学习的特征；评价战略和实施方法。在所考虑的大多数实现中，语法被视为生成形式系统，自主且独立于含义。解析器解码以非增量方式完成，通过假设所有单词都可用于解析模型并且学习语法的输出表示是分层的，通常是依赖关系或选区树，从而实现头部驱动的方式。然而，所考虑的理论和实验研究表明，基于使用的、增量的、顺序的语法系统比正式的、非增量的、分层的语法观点更合适。应该解决一方面的理论和实验研究与另一方面的计算实现之间的差距，以使计算语法归纳研究取得进一步进展。所考虑的理论和实验研究表明，基于使用的、增量的、顺序的语法系统比正式的、非增量的、分层的语法观点更合适。应该解决一方面的理论和实验研究与另一方面的计算实现之间的差距，以使计算语法归纳研究取得进一步进展。所考虑的理论和实验研究表明，基于使用的、增量的、顺序的语法系统比正式的、非增量的、分层的语法观点更合适。应该解决一方面的理论和实验研究与另一方面的计算实现之间的差距，以使计算语法归纳研究取得进一步进展。

更新日期：2020-10-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11