当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Urdu-English Machine Transliteration using Neural Networks
arXiv - CS - Computation and Language Pub Date : 2020-01-12 , DOI: arxiv-2001.05296
Usman Mohy ud Din

Machine translation has gained much attention in recent years. It is a sub-field of computational linguistic which focus on translating text from one language to other language. Among different translation techniques, neural network currently leading the domain with its capabilities of providing a single large neural network with attention mechanism, sequence-to-sequence and long-short term modelling. Despite significant progress in domain of machine translation, translation of out-of-vocabulary words(OOV) which include technical terms, named-entities, foreign words are still a challenge for current state-of-art translation systems, and this situation becomes even worse while translating between low resource languages or languages having different structures. Due to morphological richness of a language, a word may have different meninges in different context. In such scenarios, translation of word is not only enough in order provide the correct/quality translation. Transliteration is a way to consider the context of word/sentence during translation. For low resource language like Urdu, it is very difficult to have/find parallel corpus for transliteration which is large enough to train the system. In this work, we presented transliteration technique based on Expectation Maximization (EM) which is un-supervised and language independent. Systems learns the pattern and out-of-vocabulary (OOV) words from parallel corpus and there is no need to train it on transliteration corpus explicitly. This approach is tested on three models of statistical machine translation (SMT) which include phrasebased, hierarchical phrase-based and factor based models and two models of neural machine translation which include LSTM and transformer model.

中文翻译:

使用神经网络的乌尔都语-英语机器音译

近年来,机器翻译备受关注。它是计算语言学的一个子领域,专注于将文本从一种语言翻译成另一种语言。在不同的翻译技术中,神经网络目前在该领域处于领先地位,其提供具有注意力机制、序列到序列和长短期建模的单个大型神经网络的能力。尽管机器翻译领域取得了重大进展,但包括技术术语、命名实体、外来词在内的词外词(OOV)的翻译仍然是当前最先进的翻译系统的挑战,这种情况变得更加严重在低资源语言或具有不同结构的语言之间进行翻译时更糟。由于语言形态的丰富性,一个词在不同的上下文中可能有不同的脑膜。在这种情况下,单词的翻译不仅足以提供正确/高质量的翻译。音译是在翻译过程中考虑单词/句子上下文的一种方式。对于像乌尔都语这样的低资源语言,很难有/找到足够大的用于音译的平行语料库来训练系统。在这项工作中,我们提出了基于期望最大化 (EM) 的音译技术,该技术不受监督且与语言无关。系统从平行语料库中学习模式和词外 (OOV) 词,无需在音译语料库上明确训练。这种方法在三种统计机器翻译 (SMT) 模型上进行了测试,包括基于短语、
更新日期:2020-01-16
down
wechat
bug