Does Summary Evaluation Survive Translation to Other Languages?,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Does Summary Evaluation Survive Translation to Other Languages?
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08129
Neslihan Iskender, Oleg Vasilyev, Tim Polzehl, John Bohannon, Sebastian Möller

The creation of a large summarization quality dataset is a considerable, expensive, time-consuming effort, requiring careful planning and setup. It includes producing human-written and machine-generated summaries and evaluation of the summaries by humans, preferably by linguistic experts, and by automatic evaluation tools. If such effort is made in one language, it would be beneficial to be able to use it in other languages. To investigate how much we can trust the translation of such dataset without repeating human annotations in another language, we translated an existing English summarization dataset, SummEval dataset, to four different languages and analyzed the scores from the automatic evaluation metrics in translated languages, as well as their correlation with human annotations in the source language. Our results reveal that although translation changes the absolute value of automatic scores, the scores keep the same rank order and approximately the same correlations with human annotations.

中文翻译：

摘要评估能否在翻译成其他语言时幸存下来？

创建大型摘要质量数据集是一项相当大的、昂贵的、耗时的工作，需要仔细规划和设置。它包括生成人工编写和机器生成的摘要，并由人类（最好是语言专家）和自动评估工具对摘要进行评估。如果这样的努力是在一种语言中进行的，那么能够在其他语言中使用它将会是有益的。为了研究我们可以在多大程度上信任此类数据集的翻译而无需在另一种语言中重复人工注释，我们将现有的英语摘要数据集 SummEval 数据集翻译成四种不同的语言，并分析了翻译语言中自动评估指标的分数，以及作为它们与源语言中人工注释的相关性。

更新日期：2021-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>