Learning (k,l)-context-sensitive probabilistic grammars with nonparametric Bayesian approach,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning (k,l)-context-sensitive probabilistic grammars with nonparametric Bayesian approach
Machine Learning ( IF 7.5 ) Pub Date : 2021-07-16 , DOI: 10.1007/s10994-021-06034-2
Chihiro Shibata ₁

Affiliation

Inferring formal grammars with nonparametric Bayesian approach is one of the most powerful approach for achieving high accuracy from unsupervised data. In this paper, mildly-context-sensitive probabilities, called (k, l)-context-sensitive probabilities, are defined on context-free grammars (CFGs). Inferring CFGs where the probabilities of rules are identified from contexts can be seen as a kind of dual approaches for distributional learning, in which the contexts characterize the substrings. We can handle the data sparsity for the context-sensitive probabilities by the smoothing effect of the hierarchical nonparametric Bayesian models such as Pitman–Yor processes (PYPs). We define the hierarchy of PYPs naturally by augmenting the infinite PCFGs. The blocked Gibbs sampling is known to be effective for inferring PCFGs. We show that, by modifying the inside probabilities, the blocked Gibbs sampling is able to be applied to the (k, l)-context-sensitive probabilistic grammars. At the same time, we show that the time complexity for (k, l)-context-sensitive probabilities of a CFG is \(O(|V|^{l+3}|w|^3)\) for each sentence w, where V is a set of nonterminals. Since it is computationally too expensive to iterate sufficient times especially when |V| is not small, some alternative sampling algorithms are required. Therefore, we propose a new sampling method called composite sampling, with which the sampling procedure is separated into sub-procedures for nonterminals and for derivation trees. Finally, we demonstrate that the inferred (k, 0)-context-sensitive probabilistic grammars can achieve lower perplexities than other probabilistic language models such as PCFGs, n-grams, and HMMs.

中文翻译：

使用非参数贝叶斯方法学习 (k,l)-context-sensitive 概率语法

使用非参数贝叶斯方法推断形式语法是从无监督数据中实现高精度的最强大方法之一。在本文中，轻度上下文敏感的概率，称为 ( k , l)-上下文敏感概率，在上下文无关文法 (CFG) 上定义。从上下文中识别规则概率的推断 CFG 可以看作是一种分布式学习的双重方法，其中上下文表征子串。我们可以通过分层非参数贝叶斯模型（例如 Pitman-Yor 过程 (PYP)）的平滑效果来处理上下文相关概率的数据稀疏性。我们通过增加无限的 PCFG 来自然地定义 PYP 的层次结构。已知阻塞的 Gibbs 采样对于推断 PCFG 是有效的。我们表明，通过修改内部概率，阻塞 Gibbs 采样能够应用于 ( k , l)-上下文敏感的概率语法。同时，我们表明对于每个句子，CFG 的( k , l ) 上下文敏感概率的时间复杂度为\(O(|V|^{l+3}|w|^3)\) w，其中V是一组非终结符。因为迭代足够的次数在计算上太昂贵了，尤其是当 | V | 不小，需要一些替代的采样算法。因此，我们提出了一种称为复合采样的新采样方法，将采样过程分为非终结点和派生树的子过程。最后，我们证明了推断的 ( k, 0)-context-sensitive 概率语法可以比其他概率语言模型（如 PCFG、n-gram 和 HMM）实现更低的困惑度。

更新日期：2021-07-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>