大象在飞吗？在文本到图像生成模型中解决歧义

论文标题

大象在飞吗？在文本到图像生成模型中解决歧义

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

论文作者

Mehrabi, Ninareh, Goyal, Palash, Verma, Apurv, Dhamala, Jwala, Kumar, Varun, Hu, Qian, Chang, Kai-Wei, Zemel, Richard, Galstyan, Aram, Gupta, Rahul

论文摘要

自然语言通常包含可能导致误解和误解的歧义。尽管人类可以通过提出澄清问题和/或依靠上下文提示和常识性知识来有效地处理歧义，但对于机器来说，解决歧义可能很难。在这项工作中，我们研究了文本到图像生成模型中出现的歧义。我们策划一个基准数据集，涵盖这些系统中发生的不同类型的歧义。然后，我们提出了一个框架，以通过征求用户的澄清来减轻系统提示中的歧义。通过自动和人类的评估，我们展示了框架在存在歧义的情况下产生与人类意图一致的更忠实的图像的有效性。

Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benchmark dataset covering different types of ambiguities that occur in these systems. We then propose a framework to mitigate ambiguities in the prompts given to the systems by soliciting clarifications from the user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with human intention in the presence of ambiguities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题