当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Facile Solutions to the Problems Associated with Chemical Information and Mathematical Symbolism While Using Machine Translation Tools.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-06-25 , DOI: 10.1021/acs.jcim.0c00274
M Farooq Wahab 1 , Sonia Zulfiqar 2 , Muhammad Ilyas Sarwar 3 , Ingo Lieberwirth 4
Affiliation  

Advances in computer-aided translation technology have made tremendous progress in accuracy in the past few years. Chemical Abstracts Service of the American Chemical Society summarizes scientific works from more than 50 languages and allows the users to search papers in nine selected languages. Currently, only the abstracts are rendered into English by human experts or by machine translation because full text translation of millions of articles is beyond the human capacity today. An English translation of a research paper, scientific book, or patent is often required for research, data mining, and for historical purposes from various foreign languages. Many fundamental papers in chemistry, quantum chemistry, physics, and mathematics contain a significant number of chemical or mathematical equations. One of the major known problems in machine translation of such symbolically dense texts is incorrect or meaningless output. This article describes how to optimize the existing machine translation tools to read foreign language papers embedded with chemical/mathematical equations. German and French languages have been selected for illustrative purposes for English translation. Direct upload of text with extensive symbolism is possible with certain services, but this also occasionally produces erroneous rendition into English. A facile solution to the associated problems with embedded equations and mathematical formulas is replacing the equations and notations with “dummy” variables. The placeholder or dummy symbols can be removed after translation, and the original equations are substituted again. This approach, which can be automated in future, relies on the idea that chemical formulas and mathematical notations are universal. Following the guidelines in the article, excellent translations can be produced from a text having interspersed equations and chemical symbols.

中文翻译:

使用机器翻译工具时与化学信息和数学符号有关的问题的简便解决方案。

在过去的几年中,计算机辅助翻译技术的进步在准确性方面取得了巨大进步。美国化学学会的化学文摘社总结了50多种语言的科学著作,并允许用户搜索9种选定语言的论文。目前,人类专家或机器翻译只能将摘要翻译成英文,因为当今数百万篇文章的全文翻译已超出了人类的能力。为了进行研究,数据挖掘以及出于各种历史目的,通常需要研究论文,科学书籍或专利的英文翻译。化学,量子化学,物理学和数学的许多基础论文都包含大量的化学或数学方程式。在这种符号密集的文本的机器翻译中的主要已知问题之一是不正确或无意义的输出。本文介绍如何优化现有的机器翻译工具,以阅读嵌入化学/数学方程式的外文论文。为了说明性目的,选择了德语和法语进行英语翻译。在某些服务中,可以直接上传带有广泛象征意义的文本,但这有时也会错误地将其翻译成英文。解决嵌入方程式和数学公式相关问题的一种简便方法是用“虚拟”变量代替方程式和符号。转换后可以删除占位符或伪符号,然后再次替换原始方程式。这种方法可以在将来实现自动化,依赖于化学公式和数学符号是通用的想法。遵循本文中的指导原则,可以从包含散布的方程式和化学符号的文本中获得出色的翻译效果。
更新日期:2020-07-27
down
wechat
bug