Design Guidelines for Prompt Engineering Text-to-Image Generative Models,arXiv - CS - Human-Computer Interaction

当前位置： X-MOL 学术 › arXiv.cs.HC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Design Guidelines for Prompt Engineering Text-to-Image Generative Models
arXiv - CS - Human-Computer Interaction Pub Date : 2021-09-14 , DOI: arxiv-2109.06977
Vivian Liu, Lydia B. Chilton

Text-to-image generative models are a new and powerful way to generate visual artwork. The free-form nature of text as interaction is double-edged; while users have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt components and model parameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style and investigate success and failure modes within these dimensions. Our evaluation of 5493 generations over the course of five experiments spans 49 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people find better outcomes from text-to-image generative models.

中文翻译：

即时工程文本到图像生成模型的设计指南

文本到图像的生成模型是一种生成视觉艺术作品的强大的新方法。文本作为交互的自由形式本质是双刃的；虽然用户可以访问无限范围的代，但当结果质量很差时，他们还必须对文本提示进行蛮力试错。我们进行了一项研究，探索哪些提示组件和模型参数可以帮助产生连贯的输出。特别是，我们研究了结构化的提示，包括主题和风格，并在这些维度内调查成功和失败模式。我们在五个实验过程中对 5493 代的评估涵盖了 49 个抽象和具体主题以及 51 个抽象和具象风格。从这个评估中，我们提出了设计指南，可以帮助人们从文本到图像的生成模型中找到更好的结果。

更新日期：2021-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文