当前位置: X-MOL 学术Digit. Scholarsh. Hum.it. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using syntax for improving phrase-based SMT in low-resource languages
Digital Scholarship in the Humanities ( IF 0.7 ) Pub Date : 2019-07-10 , DOI: 10.1093/llc/fqz033
Hakimeh Fadaei 1 , Heshaam Faili 2
Affiliation  

Abstract
Data driven approaches for machine translation, such as statistical and neural machine translation, suffer from sparsity when dealing with low-resource languages. In these cases, using other sources of information including linguistic information could alleviate the problem. In this article, we focus on the problem of word ordering in translation from a high-resource to a low-resource language and try to improve the quality by using syntactic information from the high-resource side. We propose some syntactic features based on Tree Adjoining Grammar (TAG) to be employed in a phrase-based SMT model in order to improve the word ordering. In this work, a set of synchronous TAG rules is extracted and used to estimate the probability of the phrase orders suggested by the phrase-based model. The main idea of the article is to handle the word ordering by using the extended domain of locality property of TAG and abstracting the long distance dependencies into a local view, which is a TAG elementary tree. The experiments on English–Persian and English–German translation showed that, by combining the proposed TAG-based reordering features with lexical and hierarchical reordering models, we gain significant improvements over the baseline and in comparison with a neural reordering model and a pre-reordering model.


中文翻译:

使用语法改进低资源语言中基于短语的SMT

摘要
数据驱动的机器翻译方法(例如统计和神经机器翻译)在处理资源匮乏的语言时会遇到稀疏性。在这些情况下,使用其他信息源(包括语言信息)可以缓解此问题。在本文中,我们将重点放在从高资源语言到低资源语言的翻译中的词序问题,并尝试通过使用高资源方面的语法信息来提高质量。我们提出了一些基于树的邻接语法(TAG)的句法功能,用于基于短语的SMT模型中,以改善单词的顺序。在这项工作中,提取了一组同步TAG规则,并将其用于估计基于短语的模型建议的短语顺序的概率。本文的主要思想是通过使用TAG的位置属性的扩展域并将长距离依赖项抽象到本地视图(即TAG基本树)中来处理单词排序。关于英语-波斯和英语-德语翻译的实验表明,通过将建议的基于TAG的重排序功能与词汇和层次重排序模型相结合,与神经重排序模型和预重排序相比,我们在基线上有了显着改进模型。
更新日期:2019-07-10
down
wechat
bug