论文标题
通过扩散模型生成的几声图像产生
Few-shot Image Generation with Diffusion Models
论文作者
论文摘要
在对大量数据进行培训时,已证明具有降级扩散概率模型(DDPM)能够综合具有显着多样性的高质量图像。但是,据我们所知,基于DDPM的方法尚未研究射击图像生成任务。现代方法主要建立在生成的对抗网络(GAN)上,并使用一些可用的样本对大型源域进行了预先训练的模型,以实现目标域。在本文中,我们首次尝试研究DDPM何时过度合适并随着培训数据稀缺而遭受严重的多样性降解。然后,我们在大型源域中对DDPM进行了微调,以解决训练数据有限时解决过度拟合问题。尽管与从头开始的培训相比,直接微调的模型可以加速收敛并提高发电质量和多样性,但它们仍然无法保留一些不同的功能,并且只能产生粗糙的图像。因此,我们设计了一种DDPM成对适应性(DDPM-PA)方法,以优化少量射击DDPM域的适应性。 DDPM-PA通过保持适应过程中生成的样品之间的相对成对距离来有效地保留从源域中汲取的信息。此外,DDPM-PA从源模型和有限的培训数据中增强了高频细节的学习。 DDPM-PA进一步提高了发电质量和多样性,并且比当前基于GAN的方法更好地取得了成果。我们在定性和定量上证明了方法对一系列几个射击图像生成任务的有效性。
Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. However, to our knowledge, few-shot image generation tasks have yet to be studied with DDPM-based approaches. Modern approaches are mainly built on Generative Adversarial Networks (GANs) and adapt models pre-trained on large source domains to target domains using a few available samples. In this paper, we make the first attempt to study when do DDPMs overfit and suffer severe diversity degradation as training data become scarce. Then we fine-tune DDPMs pre-trained on large source domains to solve the overfitting problem when training data is limited. Although the directly fine-tuned models accelerate convergence and improve generation quality and diversity compared with training from scratch, they still fail to retain some diverse features and can only produce coarse images. Therefore, we design a DDPM pairwise adaptation (DDPM-PA) approach to optimize few-shot DDPM domain adaptation. DDPM-PA efficiently preserves information learned from source domains by keeping the relative pairwise distances between generated samples during adaptation. Besides, DDPM-PA enhances the learning of high-frequency details from source models and limited training data. DDPM-PA further improves generation quality and diversity and achieves results better than current state-of-the-art GAN-based approaches. We demonstrate the effectiveness of our approach on a series of few-shot image generation tasks qualitatively and quantitatively.