论文标题
KNN扩散:通过大规模检索产生图像
KNN-Diffusion: Image Generation via Large-Scale Retrieval
论文作者
论文摘要
最近的文本到图像模型取得了令人印象深刻的结果。但是,由于它们需要大规模的文本图像对数据集,因此在数据稀缺或未标记的新域上训练它们是不切实际的。在这项工作中,我们建议使用大尺度检索方法,特别是有效的K-Nearest-neighbors(KNN),它提供了新颖的功能:(1)训练一个实质上很小的,有效的文本到图像扩散模型,而无需任何文本,(2)通过在本地数据库中简单地换成本地数据库,并在固定过程中换取较少的文本图像,并(3)表现出色的时间表,并(3)表现出(3)3) - 和(3) - 和3)(3)(3) - 3)(3) - 3)(3)3) - 3) - 3) 身份。为了证明我们方法的鲁棒性,我们将KNN方法应用于两个最先进的扩散式骨架,并在几个不同的数据集上显示结果。正如人类研究和自动指标所评估的那样,与仅使用图像训练文本到图像生成模型的现有方法相比,我们的方法实现了最先进的结果(没有成对的文本数据)
Recent text-to-image models have achieved impressive results. However, since they require large-scale datasets of text-image pairs, it is impractical to train them on new domains where data is scarce or not labeled. In this work, we propose using large-scale retrieval methods, in particular, efficient k-Nearest-Neighbors (kNN), which offers novel capabilities: (1) training a substantially small and efficient text-to-image diffusion model without any text, (2) generating out-of-distribution images by simply swapping the retrieval database at inference time, and (3) performing text-driven local semantic manipulations while preserving object identity. To demonstrate the robustness of our method, we apply our kNN approach on two state-of-the-art diffusion backbones, and show results on several different datasets. As evaluated by human studies and automatic metrics, our method achieves state-of-the-art results compared to existing approaches that train text-to-image generation models using images only (without paired text data)