LDEDIT：通过潜在扩散模型迈向通用文本指导图像操纵

论文标题

LDEDIT：通过潜在扩散模型迈向通用文本指导图像操纵

LDEdit: Towards Generalized Text Guided Image Manipulation via Latent Diffusion Models

论文作者

Chandramouli, Paramanand, Gandikota, Kanchana Vaishnavi

论文摘要

视觉模型中的研究使迅速发展的发展迅速发展，从而实现了基于自然语言的界面，用于产生图像和操纵。许多现有的文本指导操作技术仅限于特定的图像类别，并且通常需要微调才能转移到其他样式或域。然而，非常需要使用具有灵活文本输入的单个模型进行通用图像操纵。最近的工作通过使用验证的视觉语言编码器指导在通用图像数据集上训练的生成模型来解决此任务。在有希望的同时，这种方法需要为每个输入进行昂贵的优化。在这项工作中，我们为文本提示中的通用图像操纵任务提出了一种无优化的方法。我们的方法利用了最近的潜在扩散模型（LDM）来形象生成，以实现零照片的文本指导操作。我们在较低维度的潜在空间中采用确定性的正向扩散，并且只需提供目标文本即可调节反向扩散过程，就可以实现所需的操作。我们将我们的方法称为ldedit。我们证明了我们方法在语义图像操纵和艺术风格转移中的适用性。我们的方法可以在不同域上完成图像操纵，并以简单的方式编辑多个属性。广泛的实验证明了我们方法比竞争基线的好处。

Research in vision-language models has seen rapid developments off-late, enabling natural language-based interfaces for image generation and manipulation. Many existing text guided manipulation techniques are restricted to specific classes of images, and often require fine-tuning to transfer to a different style or domain. Nevertheless, generic image manipulation using a single model with flexible text inputs is highly desirable. Recent work addresses this task by guiding generative models trained on the generic image datasets using pretrained vision-language encoders. While promising, this approach requires expensive optimization for each input. In this work, we propose an optimization-free method for the task of generic image manipulation from text prompts. Our approach exploits recent Latent Diffusion Models (LDM) for text to image generation to achieve zero-shot text guided manipulation. We employ a deterministic forward diffusion in a lower dimensional latent space, and the desired manipulation is achieved by simply providing the target text to condition the reverse diffusion process. We refer to our approach as LDEdit. We demonstrate the applicability of our method on semantic image manipulation and artistic style transfer. Our method can accomplish image manipulation on diverse domains and enables editing multiple attributes in a straightforward fashion. Extensive experiments demonstrate the benefit of our approach over competing baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题