当前位置: X-MOL 学术Comput. Linguist. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian Learning of Latent Representations of Language Structures
Computational Linguistics ( IF 9.3 ) Pub Date : 2019-06-01 , DOI: 10.1162/coli_a_00346
Yugo Murawaki 1
Affiliation  

We borrow the concept of representation learning from deep learning research, and we argue that the quest for Greenbergian implicational universals can be reformulated as the learning of good latent representations of languages, or sequences of surface typological features. By projecting languages into latent representations and performing inference in the latent space, we can handle complex dependencies among features in an implicit manner. The most challenging problem in turning the idea into a concrete computational model is the alarmingly large number of missing values in existing typological databases. To address this problem, we keep the number of model parameters relatively small to avoid overfitting, adopt the Bayesian learning framework for its robustness, and exploit phylogenetically and/or spatially related languages as additional clues. Experiments show that the proposed model recovers missing values more accurately than others and that some latent variables exhibit phylogenetic and spatial signals comparable to those of surface features.

中文翻译:

语言结构潜在表示的贝叶斯学习

我们从深度学习研究中借用了表征学习的概念,我们认为对格林伯格蕴涵共性的探索可以重新表述为对语言的良好潜在表征或表面类型学特征序列的学习。通过将语言投影到潜在表示中并在潜在空间中执行推理,我们可以以隐式方式处理特征之间的复杂依赖关系。将这个想法转化为具体的计算模型的最具挑战性的问题是现有类型数据库中数量惊人的缺失值。为了解决这个问题,我们保持模型参数的数量相对较小以避免过度拟合,采用贝叶斯学习框架来提高其鲁棒性,并利用系统发育和/或空间相关的语言作为额外的线索。
更新日期:2019-06-01
down
wechat
bug