当前位置: X-MOL 学术IETE Tech. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Enhanced RBMT: When RBMT Outperforms Modern Data-Driven Translators
IETE Technical Review ( IF 2.5 ) Pub Date : 2022-02-06 , DOI: 10.1080/02564602.2022.2026828
Md. Adnanul Islam 1 , Md. Saidul Hoque Anik 2 , A. B. M. Alim Al Islam 3
Affiliation  

Although prominent translators, such as Google, Yahoo Babel Fish, Bing, etc., perform better when translating most widely used languages, they tend to commit fundamental mistakes in working with low-resource languages such as Bengali, Romanian, Arabic, etc. Such translators (e.g. Google Translate) use different data-driven translation approaches, such as neural machine translation (NMT), statistical machine translation (SMT), etc., to develop their polyglot translation system. However, performances of these data-driven approaches entirely rely on the attainability of significantly large parallel corpora of the translating language pairs. As a consequence, numerous popular languages, such as Bengali, remain barely explored not only in machine translation but also in other fields of natural language processing. Therefore, the target of this study is to explore effective translation from Bengali to English by accomplishing several Bengali language processing tasks. To be precise, we adopt a basic rule-based machine translator for translating from Bengali to English. Next, we enhance its performance by considering the veracious interpretation of the Bengali names as subjects (and nouns) in a sentence. Besides, we propose a Bengali verb identification and optimization technique by root-word detection (stemming) of the Bengali verbs. Finally, we unfold the efficacy of our proposed techniques through a comparative analysis with popular data-driven translators using a novel customized dataset focusing on Bengali-to-English translation.



中文翻译:

增强的 RBMT:当 RBMT 优于现代数据驱动翻译器时

尽管 Google、Yahoo Babel Fish、Bing知名翻译公司在翻译最广泛使用的语言时表现更好,但在处理孟加拉语、罗马尼亚语、阿拉伯语资源匮乏的语言时,他们往往会犯根本性错误。此类翻译器(例如谷歌翻译)使用不同的数据驱动翻译方法,例如神经机器翻译(NMT)、统计机器翻译(SMT)., 开发他们的多语言翻译系统。然而,这些数据驱动方法的性能完全依赖于翻译语言对的大量并行语料库的可获得性。因此,许多流行语言,如孟加拉语,不仅在机器翻译领域,而且在自然语言处理的其他领域都几乎没有被探索过。因此,本研究的目标是通过完成几项孟加拉语语言处理任务来探索从孟加拉语到英语的有效翻译。准确地说,我们采用基于规则的基本机器翻译器将孟加拉语翻译成英语。接下来,我们通过考虑将孟加拉语名称正确解释为句子中的主语(和名词)来提高其性能。除了,我们通过孟加拉语动词的词根检测(词干提取)提出了一种孟加拉语动词识别和优化技术。最后,我们使用专注于孟加拉语到英语翻译的新型定制数据集,通过与流行的数据驱动翻译器进行比较分析,展示了我们提出的技术的功效。

更新日期:2022-02-06
down
wechat
bug