当前位置: X-MOL 学术Comput. Hum. Behav. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry
Computers in Human Behavior ( IF 9.0 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.chb.2020.106553
Nils Köbis , Luca D. Mossink

Abstract The release of openly available, robust natural language generation algorithms (NLG) has spurred much public attention and debate. One reason lies in the algorithms' purported ability to generate humanlike text across various domains. Empirical evidence using incentivized tasks to assess whether people (a) can distinguish and (b) prefer algorithm-generated versus human-written text is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of poems. From these samples, either a random poem was chosen (Human-out-of-theloop) or the best one was selected (Human-in-the-loop) and in turn matched with a human-written poem. In a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmicallygenerated poems in the Human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, people reveal a slight aversion to algorithm-generated poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in human-agent experimental settings.

中文翻译:

人工智能与 Maya Angelou:实验证据表明人们无法区分人工智能生成的诗歌和人类书写的诗歌

摘要 公开可用的、健壮的自然语言生成算法 (NLG) 的发布引起了公众的广泛关注和争论。原因之一在于算法据称能够在各个领域生成类似人类的文本。缺乏使用激励任务来评估人们(a)是否可以区分和(b)是否更喜欢算法生成的文本与人工编写的文本的经验证据。我们进行了两个实验,评估对最先进的自然语言生成算法 GPT-2 (Ntotal = 830) 的行为反应。使用人类诗歌的相同起始行,GPT-2 生成了诗歌样本。从这些样本中,随机选择一首诗(Human-out-of-theloop)或选择最好的一首(Human-in-the-loop),然后再与人工写的诗匹配。在图灵测试的新激励版本中,参与者未能在人在循环处理中可靠地检测到算法生成的诗歌,但在人在循环处理中却成功了。此外,人们对算法生成的诗歌表现出轻微的反感,这与参与者是否被告知这首诗的算法起源(透明度)无关(不透明度)。我们讨论了这些结果对 NLG 算法在生成类人文本方面的性能有何影响,并提出了在人类代理实验设置中研究此类学习算法的方法。独立于参与者是否被告知这首诗的算法起源(透明度)或不(不透明度)。我们讨论了这些结果对 NLG 算法在生成类人文本方面的性能有何影响,并提出了在人类代理实验设置中研究此类学习算法的方法。独立于参与者是否被告知这首诗的算法起源(透明度)或不(不透明度)。我们讨论了这些结果对 NLG 算法在生成类人文本方面的性能有何影响,并提出了在人类代理实验设置中研究此类学习算法的方法。
更新日期:2021-01-01
down
wechat
bug