当前位置: X-MOL 学术Complexity › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive Language Processing Based on Deep Learning in Cloud Computing Platform
Complexity ( IF 1.7 ) Pub Date : 2020-06-19 , DOI: 10.1155/2020/5828130
Wenbin Xu 1 , Chengbo Yin 2
Affiliation  

With the continuous advancement of technology, the amount of information and knowledge disseminated on the Internet every day has been developing several times. At the same time, a large amount of bilingual data has also been produced in the real world. These data are undoubtedly a great asset for statistical machine translation research. Based on the dual-sentence quality corpus screening, two corpus screening strategies are proposed first, based on the double-sentence pair length ratio method and the word-based alignment information method. The innovation of these two methods is that no additional linguistic resources such as bilingual dictionary and syntactic analyzer are needed as auxiliary. No manual intervention is required, and the poor quality sentence pairs can be automatically selected and can be applied to any language pair. Secondly, a domain adaptive method based on massive corpus is proposed. The method based on massive corpus utilizes massive corpus mechanism to carry out multidomain automatic model migration. In this domain, each domain learns the intradomain model independently, and different domains share the same general model. Through the method of massive corpus, these models can be combined and adjusted to make the model learning more accurate. Finally, the adaptive method of massive corpus filtering and statistical machine translation based on cloud platform is verified. Experiments show that both methods have good effects and can effectively improve the translation quality of statistical machines.

中文翻译:

云计算平台中基于深度学习的自适应语言处理

随着技术的不断发展,每天在Internet上传播的信息和知识的数量已经发展了数倍。同时,现实世界中也产生了大量的双语数据。这些数据无疑是统计机器翻译研究的重要资产。在双句质量语料筛选的基础上,首先提出了两种基于双句对长度比和基于词的对齐信息方法的语料筛选策略。这两种方法的创新之处在于,不需要额外的语言资源(如双语词典和句法分析器)作为辅助。不需要人工干预,劣质的句子对可以自动选择并可应用于任何语言对。其次,提出了一种基于大规模语料的领域自适应方法。基于大规模语料库的方法利用大规模语料库机制进行多域自动模型迁移。在此域中,每个域都独立学习域内模型,并且不同的域共享相同的通用模型。通过大规模语料库的方法,可以对这些模型进行组合和调整,使模型学习更加准确。最后,验证了基于云平台的大规模语料库过滤和统计机器翻译的自适应方法。实验表明,两种方法都具有良好的效果,可以有效提高统计机器的翻译质量。基于大规模语料库的方法利用大规模语料库机制进行多域自动模型迁移。在此域中,每个域都独立学习域内模型,并且不同的域共享相同的通用模型。通过大规模语料库的方法,可以对这些模型进行组合和调整,使模型学习更加准确。最后,验证了基于云平台的大规模语料库过滤和统计机器翻译的自适应方法。实验表明,两种方法都具有良好的效果,可以有效提高统计机器的翻译质量。基于大规模语料库的方法利用大规模语料库机制进行多域自动模型迁移。在此域中,每个域都独立学习域内模型,并且不同的域共享相同的通用模型。通过大规模语料库的方法,可以对这些模型进行组合和调整,使模型学习更加准确。最后,验证了基于云平台的大规模语料库过滤和统计机器翻译的自适应方法。实验表明,两种方法都具有良好的效果,可以有效提高统计机器的翻译质量。这些模型可以组合和调整,以使模型学习更加准确。最后,验证了基于云平台的大规模语料库过滤和统计机器翻译的自适应方法。实验表明,两种方法均具有良好的效果,可以有效提高统计机器的翻译质量。这些模型可以组合和调整,以使模型学习更加准确。最后,验证了基于云平台的大规模语料库过滤和统计机器翻译的自适应方法。实验表明,两种方法都具有良好的效果,可以有效提高统计机器的翻译质量。
更新日期:2020-06-19
down
wechat
bug