VQGAN-CLIP：开放域图像生成和编辑自然语言指导

论文标题

VQGAN-CLIP：开放域图像生成和编辑自然语言指导

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

论文作者

Crowson, Katherine, Biderman, Stella, Kornis, Daniel, Stander, Dashiell, Hallahan, Eric, Castricato, Louis, Raff, Edward

论文摘要

从开放式域文本提示中生成和编辑图像是迄今为止需要昂贵且经过特殊训练的型号的一项具有挑战性的任务。我们为这两个任务展示了一种新颖的方法，该方法能够通过使用多模式编码器来指导图像世代的文本提示，而无需进行任何训练。我们展示了如何使用Clip [37]指导VQGAN [11]的各种任务，尽管未接受培训的任务培训，但尽管未接受培训，但诸如DALL-E [38]，Glide [33]和Open-EdiT [24]之类的灵活方法较不灵活的方法。我们的代码在公共存储库中可用。

Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models. We demonstrate a novel methodology for both tasks which is capable of producing images of high visual quality from text prompts of significant semantic complexity without any training by using a multimodal encoder to guide image generations. We demonstrate on a variety of tasks how using CLIP [37] to guide VQGAN [11] produces higher visual quality outputs than prior, less flexible approaches like DALL-E [38], GLIDE [33] and Open-Edit [24], despite not being trained for the tasks presented. Our code is available in a public repository.

下载PDF全文

下载文献需遵守相关版权规定

论文标题