A Set of Recommendations for Assessing Human-Machine Parity in Language Translation,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Set of Recommendations for Assessing Human-Machine Parity in Language Translation
arXiv - CS - Artificial Intelligence Pub Date : 2020-04-03 , DOI: arxiv-2004.01694
Samuel L\"aubli and Sheila Castilho and Graham Neubig and Rico Sennrich and Qinlan Shen and Antonio Toral

The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.

中文翻译：

一套评估语言翻译中人机对等性的建议

在过去的几年中，机器翻译的质量显着提高，以至于在许多实证研究中发现它与专业的人工翻译无法区分。我们重新评估了 Hassan 等人 2018 年对中英文新闻翻译的调查，表明人机对等的发现是由于评估设计的弱点——目前被认为是该领域的最佳实践。我们表明，专业的人工翻译包含的错误要少得多，人工评估的感知质量取决于评分者的选择、语言背景的可用性以及参考翻译的创建。

更新日期：2020-04-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>