论文标题

人图像综合通过denoising扩散模型

Person Image Synthesis via Denoising Diffusion Model

论文作者

Bhunia, Ankan Kumar, Khan, Salman, Cholakkal, Hisham, Anwer, Rao Muhammad, Laaksonen, Jorma, Shah, Mubarak, Khan, Fahad Shahbaz

论文摘要

姿势指导的人形象生成任务需要在任意姿势中综合人类的感性图像。现有的方法使用生成的对抗网络,这些网络不一定保持逼真的纹理或需要努力处理复杂变形和严重阻塞的密集对应。在这项工作中,我们展示了如何使用强大的样本多样性和增强学习数据分布的模式覆盖的高保真人物图像合成。我们提出的人员图像扩散模型(PIDM)将复杂的转移问题分解为一系列简单的前向降级步骤。这有助于学习合理的源头转换轨迹,从而导致忠实的纹理和未经证实的外观细节。我们基于交叉注意引入了一个“纹理扩散模块”,以准确地对源和目标图像中可用的姿势信息之间的对应关系进行准确模拟。此外,我们提出了“无分类器指导”,以确保条件输入和综合输出之间的相似之处在姿势和外观信息方面。我们对两个大规模基准和一项用户研究的广泛结果证明了我们在具有挑战性的情况下我们提出的方法的光真相。我们还展示了生成的图像如何帮助下游任务。我们的代码和模型将公开发布。

The pose-guided person image generation task requires synthesizing photorealistic images of humans in arbitrary poses. The existing approaches use generative adversarial networks that do not necessarily maintain realistic textures or need dense correspondences that struggle to handle complex deformations and severe occlusions. In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution. Our proposed Person Image Diffusion Model (PIDM) disintegrates the complex transfer problem into a series of simpler forward-backward denoising steps. This helps in learning plausible source-to-target transformation trajectories that result in faithful textures and undistorted appearance details. We introduce a 'texture diffusion module' based on cross-attention to accurately model the correspondences between appearance and pose information available in source and target images. Further, we propose 'disentangled classifier-free guidance' to ensure close resemblance between the conditional inputs and the synthesized output in terms of both pose and appearance information. Our extensive results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios. We also show how our generated images can help in downstream tasks. Our code and models will be publicly released.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源