Human evaluation of automatically generated text: Current trends and best practice guidelines

https://doi.org/10.1016/j.csl.2020.101151Get rights and content
Under a Creative Commons license
open access

Highlights

  • The current paper provides an overview of human evaluation practices in NLG.

  • The current paper gives an overview of the steps necessary to undertake a human evaluation study.

  • Building on findings from NLG, but also statistics and the behavioral sciences, the current paper provides a set of recommendations and best practices for human evaluation in NLG.

Abstract

Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.

Keywords

Natural Language Generation
Human evaluation
Recommendations
Literature review
Open science
Ethics

Cited by (0)

Chris van der Lee is a Ph.D. student at the Tilburg center for Cognition and Communication. His research mainly focuses on data-to-text generation, dialogue systems, automated journalism, and text classification.

Albert Gatt is a Senior Lecturer and Director of the Institute of Linguistic and Language Technology, University of Malta. His research interests are in data-to-text generation, the Vision-Language interface, and NLP evaluation.

Emiel van Miltenburg is an assistant professor at the Tilburg center for Cognition and Communication, Tilburg University. His research interests include (multimodal) NLG, evaluation, accessibility, and ethics in NLP.

Emiel Krahmer is a full professor at the Tilburg center for Cognition and Communication, Tilburg University. He studies how people communicate with each other and how computers can be taught to do the same. Specific research interests include natural language generation, human-robot interaction, health communication and evaluation.

1

orcid=0000-0003-3454-026X

2

orcid=0000-0001-6388-8244

3

orcid=0000-0002-7143-8961

4

orcid=0000-0002-6304-7549