当前位置: X-MOL 学术Int. J. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring a genotype based formalisation for tree adjoining grammar derivations
International Journal of Information Technology Pub Date : 2021-04-27 , DOI: 10.1007/s41870-021-00660-8
Vijay Krishna Menon , Soman K P

Tree Adjoining Grammars (TAGs) are very useful psycholinguistic formalisms for syntax and dependency analysis of phrase structures. Since natural languages are finitely ambiguous, TAGs are ideal to model them being mildly context sensitive. But these grammars are very hard to parse as they have a worst case complexity of \(O(n^6)\). In reality most conventional TAG parsing algorithms run in \(O(n^3)\) and give the most probable parse, but trying to extract multiple ambiguous parses for a given phrase degrades the performance of the traditional TAG parsing algorithms toward worst case runtime, especially for longer sentences. Ambiguity in TAGs are not as well understood as in CFGs due to their complex derivation process; hence one of the ways to understand it is to find a finite set of ambiguous derivations for the given phrase structure. In this article we extend the definition of the containing formalism introduced as GATAGs on which a genetic algorithm can be deployed in order to find ambiguous derivation structures for longer sentences. We shall formalise the genotypes, phenotypes and the fitness functions for the entailing genetic algorithm, exploring various avenues for their efficient computations. Our main objective here is to explore the possibility of random derivations evolving to good derivations though a natural selection.



中文翻译:

探索基于基因型的树形邻接语法推导形式

树邻接语法(TAG)是非常有用的心理语言形式学,用于短语结构的语法和相关性分析。由于自然语言是有限模棱两可的,因此TAG是理想的建模对象,具有适度的上下文敏感性。但是这些语法很难解析,因为它们的最坏情况下的复杂度为\(O(n ^ 6)\)。实际上,大多数常规TAG解析算法都在\(O(n ^ 3)\)中运行并给出最可能的解析,但是尝试为给定短语提取多个歧义解析会降低传统TAG解析算法在最坏情况下的性能,尤其对于较长的句子。TAG的歧义性不如CFG理解性强,因为它们的推导过程复杂。因此,理解它的方法之一是为给定的短语结构找到一组有限的歧义导数。在本文中,我们扩展了作为GATAG引入的包含形式主义的定义,可以在该GATAG上部署遗传算法,以便为较长的句子找到歧义的派生结构。我们将形式化遗传算法的基因型,表型和适应度函数形式化,探索有效计算的各种途径。

更新日期:2021-04-27
down
wechat
bug