Logicrank：生成文本到图像系统的逻辑诱导的重读

论文标题

Logicrank：生成文本到图像系统的逻辑诱导的重读

LogicRank: Logic Induced Reranking for Generative Text-to-Image Systems

论文作者

Deiseroth, Björn, Schramowski, Patrick, Shindo, Hikaru, Dhami, Devendra Singh, Kersting, Kristian

论文摘要

文本到图像模型最近通过照片现实质量看似准确的样本取得了巨大的成功。但是，随着最先进的语言模型仍在努力评估精确陈述，基于语言模型的图像生成过程也是如此。在这项工作中，我们展示了最先进的文本对图像模型（例如DALL-E）的问题，并从与Draw基准基准相关的语句中生成准确的样本。此外，我们表明夹子无法始终如一地重新读取这些样品。为此，我们提出了Logicrank，这是一种神经符号推理框架，可以为这种精确要求设置提供更准确的排名系统。 Logicrank平稳地集成到文本对图像模型的生成过程中，此外，可以用于进一步调整更逻辑的精确模型。

Text-to-image models have recently achieved remarkable success with seemingly accurate samples in photo-realistic quality. However as state-of-the-art language models still struggle evaluating precise statements consistently, so do language model based image generation processes. In this work we showcase problems of state-of-the-art text-to-image models like DALL-E with generating accurate samples from statements related to the draw bench benchmark. Furthermore we show that CLIP is not able to rerank those generated samples consistently. To this end we propose LogicRank, a neuro-symbolic reasoning framework that can result in a more accurate ranking-system for such precision-demanding settings. LogicRank integrates smoothly into the generation process of text-to-image models and moreover can be used to further fine-tune towards a more logical precise model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题