当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar
Multimedia Systems ( IF 3.9 ) Pub Date : 2020-10-11 , DOI: 10.1007/s00530-020-00692-3
Sitender , Seema Bawa

Machine Translation is a mechanism of transforming text from one language to another with the help of computer technology. Earlier in 2018, a machine translation system had been developed by the authors that translate Sanskrit text to Universal Networking Language expressions and was named as SANSUNL. The work presented in this paper is an extension of SANSUNL system by enhancing POS tagging, Sanskrit language processing and parsing. A Sanskrit stemmer having 23 prefixes and 774 suffixes with grammar rules are used for stemming the Sanskrit sentence in the proposed system. Bidirectional long short-term memory (Bi-LSTM) and stacked LSTM deep neural network models have been used for part of speech tagging of the input Sanskrit text. A tagged dataset of around 400 k entries for Sanskrit have been used for training and testing the neural network models. Proposed Sanskrit context-free grammar has been used with CYK parser to perform the parsing of the input sentence. Size of the Sanskrit-Universal Word dictionary has been increased from 15000 to 25000 entries. Approximately 1500 UNL generation rules have been used to resolve the 46 UNL relations. Four datasets UC-A1, UC-A2, Spanish server gold standard dataset, and 500 Sanskrit sentences taken from the general domain have been used for validating the system. The proposed system is evaluated on BLEU and Fluency score metrics and has reported an efficiency of 95.375%.



机器翻译是一种借助计算机技术将文本从一种语言转换为另一种语言的机制。2018 年初,作者开发了一个机器翻译系统,将梵文文本翻译为通用网络语言表达式,并命名为 SANSUNL。本文提出的工作是通过增强词性标注、梵语语言处理和解析对 SANSUNL 系统的扩展。具有 23 个前缀和 774 个带有语法规则的后缀的梵文词干分析器用于在所提出的系统中提取梵文句子。双向长短期记忆 (Bi-LSTM) 和堆叠 LSTM 深度神经网络模型已用于输入梵文文本的词性标注。一个包含大约 40 万个梵语条目的标记数据集已被用于训练和测试神经网络模型。提出的梵文上下文无关语法已与 CYK 解析器一起使用来执行输入句子的解析。梵文通用词词典的大小已从 15000 个条目增加到 25000 个条目。大约 1500 个 UNL 生成规则已用于解决 46 个 UNL 关系。四个数据集 UC-A1、UC-A2、西班牙服务器黄金标准数据集和 500 条来自通用领域的梵文句子已用于验证系统。所提出的系统在 BLEU 和 Fluency 分数指标上进行了评估,并报告了 95.375% 的效率。梵文通用词词典的大小已从 15000 个条目增加到 25000 个条目。大约 1500 个 UNL 生成规则已用于解决 46 个 UNL 关系。四个数据集 UC-A1、UC-A2、西班牙服务器黄金标准数据集和 500 条来自通用领域的梵文句子已用于验证系统。所提出的系统在 BLEU 和 Fluency 分数指标上进行了评估,并报告了 95.375% 的效率。梵文通用词词典的大小已从 15000 个条目增加到 25000 个条目。大约 1500 个 UNL 生成规则已用于解决 46 个 UNL 关系。四个数据集 UC-A1、UC-A2、西班牙服务器黄金标准数据集和 500 条来自通用领域的梵文句子已用于验证系统。所提出的系统在 BLEU 和 Fluency 分数指标上进行了评估,并报告了 95.375% 的效率。