Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective,Semantic Web

当前位置： X-MOL 学术 › Semant. Web › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective
Semantic Web ( IF 3.0 ) Pub Date : 2021-04-29 , DOI: 10.3233/sw-210431
Lucie-Aimée Kaffee ₁ , Pavlos Vougiouklis ₂ , Elena Simperl ₃

Affiliation

Abstract

Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in 14 under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate an introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency.

中文翻译：

使用自然语言生成引导丢失的Wikipedia文章：以人为本的观点

摘要

如今，自然语言生成（NLG）已用于从新闻报道和聊天机器人到社交媒体管理的所有内容。机器学习的最新进展使得训练NLG系统成为可能，该系统寻求在文本编写和摘要中达到人类水平的性能。在本文中，我们在Wikipedia的上下文中提出了这样一个系统，并与Wikipedia的读者和编辑者进行了评估。我们的解决方案建立在ArticlePlaceholder上，ArticlePlaceholder是在资源不足的14种Wikipedia语言版本中使用的工具，该工具在空的Wikipedia页面上显示Wikidata知识库中的结构化数据。我们训练一个神经网络，以根据ArticlePlaceholder所显示的Wikidata三元组生成一个介绍性句子，并探索Wikipedia用户如何与之互动。评估包括自动的，基于判断的，和一个基于任务的组件，表明摘要句子在Wikipedia的流畅程度和适当性方面得分很高，并且可以帮助编辑者引导新文章。它还暗示了在整个Wikipedia中使用NLG解决方案的一些潜在含义，包括内容质量，技术信任和算法透明性。

更新日期：2021-05-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11